llama.cpp/scripts
Max Krasnyansky cff777f226
hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822)
* hexagon: disable repack buffers if host buffers are disabled, improved handling of env vars

* hexagon: add support for OP_CPY fp16/fp32 -> fp16/fp32

Factore out all hvx_copy functions into hvx-copy.h header and reduced code duplication.
Update HTP ops infra to support OP_CPY

* hexagon: cleanup and refactor hex/hvx/htp headers and helper libs

hex is basically all scalar/core platform stuff (L2, DMA, basic utils)
hvx is all hvx related utils, helpers, etc
htp is higher level stuff like Ops, etc

hvx-utils library got a nice round of cleanup and refactoring to reduce duplication

use hvx_vec_store_a where possible

* hexagon: refactor HVX sigmoid functions to hvx-sigmoid.h

Moved sigmoid and tanh vector functions from hvx-utils.h to a new header
hvx-sigmoid.h. Implemented aligned and unaligned variants for sigmoid
array processing using a macro pattern similar to hvx-copy.h. Updated
act-ops.c to use the new aligned variant hvx_sigmoid_f32_aa. Removed
unused hvx-sigmoid.c.

* hexagon: factor out hvx-sqrt.h

* hexagon: mintor update to hvx-utils.h

* hexagon: remove spurios log

* hexagon: factor out and optimize hvx_add/sub/mul

* hexagon: remove _opt variants of add/sub/mul as they simply fully aligned versions

* hexagon: refactor reduction functions to hvx-reduce.h

Moved `hvx_self_max_f32` and `hvx_self_sum_f32` from `hvx-utils.h`/`.c` to `hvx-reduce.h`.
Renamed them to `hvx_reduce_max_f32` and `hvx_reduce_sum_f32`.
Added aligned (`_a`) and unaligned (`_u`) variants and used macros to unify logic.
Updated `softmax-ops.c` to use the new functions.

* hexagon: refactor the rest of arithmetic functions to hvx-arith.h

Moved `hvx_sum_of_squares_f32`, `hvx_min_scalar_f32`, and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` to use `dst, src, ..., n` argument order. Updated call sites in `act-ops.c`.

Refactor Hexagon HVX arithmetic functions (min, clamp) to hvx-arith.h

Moved `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated these functions to use `dst, src, ..., n` argument order and updated call sites in `act-ops.c`. `hvx_sum_of_squares_f32` remains in `hvx-utils.c` as requested.

* hexagon: refactor hvx_sum_of_squares_f32

- Modify `hvx_sum_of_squares_f32` in `ggml/src/ggml-hexagon/htp/hvx-reduce.h` to use `dst, src` signature.
- Implement `_a` (aligned) and `_u` (unaligned) variants for `hvx_sum_of_squares_f32`.
- Update `hvx_reduce_loop_body` macro to support both returning and storing results via `finalize_op`.
- Update existing reduction functions in `hvx-reduce.h` to use the updated macro.
- Update `rms_norm_htp_f32` in `ggml/src/ggml-hexagon/htp/unary-ops.c` to match the new signature.

* hexagon: use hvx_splat instead of memset

* hexagon: consistent use of f32/f16 in all function names to match the rest of GGML

* hexagon: fix hvx_copy_f16_f32 on v75 and older

* hexagon: update readme to include GGML_HEXAGON_EXPERIMENTAL

* scripts: update snapdragon/adb scripts to enable host param
2026-01-14 21:46:12 -08:00
..
apple scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
jinja scripts : add Jinja tester PySide6 simple app (#15756) 2025-09-05 01:05:12 +02:00
snapdragon hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) 2026-01-14 21:46:12 -08:00
bench-models.sh scripts : add script to bench models (#16894) 2025-11-02 00:15:31 +02:00
build-info.sh llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
check-requirements.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
compare-commits.sh scripts: add sqlite3 check for compare-commits.sh (#15633) 2025-08-28 19:23:22 +08:00
compare-llama-bench.py scripts: strip "AMD Instinct" from GPU name (#15668) 2025-08-29 22:04:08 +02:00
compare-logprobs.py scripts: add script to compare logprobs of llama.cpp against other frameworks (#17947) 2025-12-13 22:33:29 +01:00
create_ops_docs.py Docs: add instructions for adding backends (#14889) 2025-07-27 09:36:43 +08:00
debug-test.sh refactor : remove libcurl, use OpenSSL when available (#18828) 2026-01-14 18:02:47 +01:00
fetch_server_test_models.py llama : move end-user examples to tools directory (#13249) 2025-05-02 20:27:13 +02:00
gen-authors.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
gen-unicode-data.py py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
get-flags.mk build : pass all warning flags to nvcc via -Xcompiler (#5570) 2024-02-18 16:21:52 -05:00
get-hellaswag.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
get-pg.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
get-wikitext-2.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
get-wikitext-103.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
get-winogrande.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
get_chat_template.py scripts: corrected encoding when getting chat template (#11866) (#11907) 2025-02-18 10:30:16 +01:00
hf.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
install-oneapi.bat support SYCL backend windows build (#5208) 2024-01-31 08:08:07 +05:30
pr2wt.sh scripts : follow api redirects in pr2wt.sh (#18739) 2026-01-10 16:04:05 +01:00
serve-static.js refactor : remove libcurl, use OpenSSL when available (#18828) 2026-01-14 18:02:47 +01:00
server-bench.py llama: use FA + max. GPU layers by default (#15434) 2025-08-30 16:32:10 +02:00
sync-ggml-am.sh scripts : update sync scripts 2025-08-18 22:06:44 +03:00
sync-ggml.last sync : ggml 2025-12-31 18:54:43 +02:00
sync-ggml.sh scripts : update sync scripts 2025-08-18 22:06:44 +03:00
sync_vendor.py vendor : update cpp-httplib to 0.30.1 (#18771) 2026-01-12 15:58:52 +01:00
tool_bench.py refactor : remove libcurl, use OpenSSL when available (#18828) 2026-01-14 18:02:47 +01:00
tool_bench.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
verify-checksum-models.py convert.py : add python logging instead of print() (#6511) 2024-05-03 22:36:41 +03:00
xxd.cmake llama : move end-user examples to tools directory (#13249) 2025-05-02 20:27:13 +02:00