llama.cpp/ggml/src/ggml-hexagon/htp
Max Krasnyansky 609ea50026
hexagon: Q4_0 and MXFP4 repack fixes (#20527)
* hexagon: fix tail corruption with rows sizes not multiple of 256

* hexagon: use different stride for repacking partial blocks

* hex-mm: update repack and kernels to avoid shuffles for full 256-element blocks

Previous commit changed the repacking to use even:odd (0:1,2:3,..) packing
instead of the original (0:128,1:129,...) packing in order to fix tail corruption.
Since the mm kernels already deal with partial tails we can use even:odd
packing only for the last block.
This avoid performance penalty of having to shuffle to zip the elements
in the common case.

* hex-mm: update rmpy x8 for better optimizations

* hex-mm: tighten supported MUL_MAT checks to avoid spurios failures

* hex-mm: use vzero to init accumulators

* hex-mm: properly call partial rmpy_x8
2026-03-14 11:09:08 -07:00
..
CMakeLists.txt hexagon: add f32 ssm_conv op (#20122) 2026-03-06 09:59:26 -08:00
act-ops.c hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
argsort-ops.c hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
binary-ops.c hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
cmake-toolchain.cmake Add experimental ggml-hexagon backend for the Hexagon NPU (#16547) 2025-10-22 13:47:09 -07:00
cpy-ops.c hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
flash-attn-ops.c hexagon: Flash Attention optimizations (dma, mpyacc, multi-row) and MatMul updates (#20118) 2026-03-04 21:55:29 -08:00
get-rows-ops.c hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
hex-dma.c hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) 2026-01-14 21:46:12 -08:00
hex-dma.h hexagon refactor all Ops to use local context struct (#19819) 2026-02-23 16:32:14 -08:00
hex-dump.h hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) 2026-01-14 21:46:12 -08:00
hex-fastdiv.h hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) 2026-01-14 21:46:12 -08:00
hex-utils.h hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) 2026-01-14 21:46:12 -08:00
htp-ctx.h hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) 2026-01-14 21:46:12 -08:00
htp-msg.h hexagon: add f32 ssm_conv op (#20122) 2026-03-06 09:59:26 -08:00
htp-ops.h hexagon: add f32 ssm_conv op (#20122) 2026-03-06 09:59:26 -08:00
htp_iface.idl Add experimental ggml-hexagon backend for the Hexagon NPU (#16547) 2025-10-22 13:47:09 -07:00
hvx-arith.h hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
hvx-base.h hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
hvx-copy.h hexagon: Flash Attention optimizations (dma, mpyacc, multi-row) and MatMul updates (#20118) 2026-03-04 21:55:29 -08:00
hvx-div.h hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
hvx-dump.h ggml-hexagon: flash-attention and reduce-sum optimizations (#19141) 2026-01-30 21:14:20 -08:00
hvx-exp.h hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) 2026-01-14 21:46:12 -08:00
hvx-floor.h hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) 2026-01-14 21:46:12 -08:00
hvx-inverse.h hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
hvx-reduce.h hexagon: Flash Attention optimizations (dma, mpyacc, multi-row) and MatMul updates (#20118) 2026-03-04 21:55:29 -08:00
hvx-scale.h hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) 2026-01-14 21:46:12 -08:00
hvx-sigmoid.h hexagon: Add ARGSORT, DIV, SQR, SQRT, SUM_ROWS, GEGLU (#19406) 2026-02-10 23:21:12 -08:00
hvx-sqrt.h hexagon: Add ARGSORT, DIV, SQR, SQRT, SUM_ROWS, GEGLU (#19406) 2026-02-10 23:21:12 -08:00
hvx-types.h hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822) 2026-01-14 21:46:12 -08:00
hvx-utils.h hexagon: add f32 ssm_conv op (#20122) 2026-03-06 09:59:26 -08:00
main.c hexagon: add f32 ssm_conv op (#20122) 2026-03-06 09:59:26 -08:00
matmul-ops.c hexagon: Q4_0 and MXFP4 repack fixes (#20527) 2026-03-14 11:09:08 -07:00
rope-ops.c hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
set-rows-ops.c hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
softmax-ops.c hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
ssm-conv.c hexagon: add f32 ssm_conv op (#20122) 2026-03-06 09:59:26 -08:00
sum-rows-ops.c hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
unary-ops.c hexagon: add fp16 support for binary ops: add,sub,mul,div (#20139) 2026-03-05 18:29:13 -08:00
worker-pool.c chore : correct typos [no ci] (#20041) 2026-03-05 08:50:21 +01:00
worker-pool.h Add experimental ggml-hexagon backend for the Hexagon NPU (#16547) 2025-10-22 13:47:09 -07:00