llama.cpp

History

Max Krasnyansky cff777f226 hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822 ) * hexagon: disable repack buffers if host buffers are disabled, improved handling of env vars * hexagon: add support for OP_CPY fp16/fp32 -> fp16/fp32 Factore out all hvx_copy functions into hvx-copy.h header and reduced code duplication. Update HTP ops infra to support OP_CPY * hexagon: cleanup and refactor hex/hvx/htp headers and helper libs hex is basically all scalar/core platform stuff (L2, DMA, basic utils) hvx is all hvx related utils, helpers, etc htp is higher level stuff like Ops, etc hvx-utils library got a nice round of cleanup and refactoring to reduce duplication use hvx_vec_store_a where possible * hexagon: refactor HVX sigmoid functions to hvx-sigmoid.h Moved sigmoid and tanh vector functions from hvx-utils.h to a new header hvx-sigmoid.h. Implemented aligned and unaligned variants for sigmoid array processing using a macro pattern similar to hvx-copy.h. Updated act-ops.c to use the new aligned variant hvx_sigmoid_f32_aa. Removed unused hvx-sigmoid.c. * hexagon: factor out hvx-sqrt.h * hexagon: mintor update to hvx-utils.h * hexagon: remove spurios log * hexagon: factor out and optimize hvx_add/sub/mul * hexagon: remove _opt variants of add/sub/mul as they simply fully aligned versions * hexagon: refactor reduction functions to hvx-reduce.h Moved `hvx_self_max_f32` and `hvx_self_sum_f32` from `hvx-utils.h`/`.c` to `hvx-reduce.h`. Renamed them to `hvx_reduce_max_f32` and `hvx_reduce_sum_f32`. Added aligned (`_a`) and unaligned (`_u`) variants and used macros to unify logic. Updated `softmax-ops.c` to use the new functions. * hexagon: refactor the rest of arithmetic functions to hvx-arith.h Moved `hvx_sum_of_squares_f32`, `hvx_min_scalar_f32`, and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` to use `dst, src, ..., n` argument order. Updated call sites in `act-ops.c`. Refactor Hexagon HVX arithmetic functions (min, clamp) to hvx-arith.h Moved `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated these functions to use `dst, src, ..., n` argument order and updated call sites in `act-ops.c`. `hvx_sum_of_squares_f32` remains in `hvx-utils.c` as requested. * hexagon: refactor hvx_sum_of_squares_f32 - Modify `hvx_sum_of_squares_f32` in `ggml/src/ggml-hexagon/htp/hvx-reduce.h` to use `dst, src` signature. - Implement `_a` (aligned) and `_u` (unaligned) variants for `hvx_sum_of_squares_f32`. - Update `hvx_reduce_loop_body` macro to support both returning and storing results via `finalize_op`. - Update existing reduction functions in `hvx-reduce.h` to use the updated macro. - Update `rms_norm_htp_f32` in `ggml/src/ggml-hexagon/htp/unary-ops.c` to match the new signature. * hexagon: use hvx_splat instead of memset * hexagon: consistent use of f32/f16 in all function names to match the rest of GGML * hexagon: fix hvx_copy_f16_f32 on v75 and older * hexagon: update readme to include GGML_HEXAGON_EXPERIMENTAL * scripts: update snapdragon/adb scripts to enable host param		2026-01-14 21:46:12 -08:00
..
ggml-blas	cmake : update blas logic (#18205 )	2026-01-10 18:00:54 +02:00
ggml-cann	ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535 )	2026-01-08 11:03:21 +02:00
ggml-cpu	kleidiai: add and integrate SVE 256-bit vector-length kernel (#18458 )	2025-12-30 14:04:53 +02:00
ggml-cuda	CUDA: Factor out and re-use `block_reduce` function (#18785 )	2026-01-15 10:44:54 +08:00
ggml-hexagon	hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822 )	2026-01-14 21:46:12 -08:00
ggml-hip	HIP: fix AMDGPU_TARGETS, update documentation (#16803 )	2025-10-27 21:39:49 +01:00
ggml-metal	ggml-metal: do not copy headers for embedded, use current binary dir for embedded (#18705 )	2026-01-14 09:22:25 +02:00
ggml-musa	CUDA: faster tile FA, add oob checks, more HSs (#16492 )	2025-10-11 20:54:32 +02:00
ggml-opencl	opencl: add SOFTPLUS op support (#18726 )	2026-01-10 21:57:44 -08:00
ggml-rpc	rpc : use unordered_map::reserve and emplace (#18513 )	2026-01-02 12:09:36 +02:00
ggml-sycl	ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535 )	2026-01-08 11:03:21 +02:00
ggml-vulkan	vulkan: Check maxStorageBufferRange in supports_op (#18709 )	2026-01-14 10:59:05 +01:00
ggml-webgpu	Updates to webgpu get_memory (#18707 )	2026-01-09 08:17:18 -08:00
ggml-zdnn	zdnn: refactor codebase + add docs (#16178 )	2025-09-23 14:53:05 +08:00
ggml-zendnn	ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690 )	2025-12-07 00:13:33 +08:00
CMakeLists.txt	kleidiai: add and integrate SVE 256-bit vector-length kernel (#18458 )	2025-12-30 14:04:53 +02:00
ggml-alloc.c	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 )	2025-12-15 09:24:59 +01:00
ggml-backend-impl.h	llama: use host memory if device reports 0 memory (#18587 )	2026-01-09 05:34:56 +08:00
ggml-backend-reg.cpp	ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690 )	2025-12-07 00:13:33 +08:00
ggml-backend.cpp	vulkan: extend topk_moe to handle sigmoid w/exp_probs_b for nemotron (#18295 )	2026-01-01 08:58:27 +01:00
ggml-common.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-impl.h	cmake: Added more x86_64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` (#18186 )	2025-12-28 09:33:29 +02:00
ggml-opt.cpp	finetune: SGD optimizer, more CLI args (#13873 )	2025-08-14 12:03:57 +02:00
ggml-quants.c	ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15928 )	2025-09-23 10:25:20 +02:00
ggml-quants.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-threading.cpp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-threading.h	remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )	2024-12-12 19:02:49 +01:00
ggml.c	ggml : fix avx512bf16 build (#18623 )	2026-01-06 08:54:10 +02:00
ggml.cpp	ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)	2025-06-01 13:43:57 +03:00
gguf.cpp	ggml, llama : use defaulted constructors/destructors (#17649 )	2025-12-03 07:12:18 +01:00