llama.cpp/ggml/src/ggml-hexagon
Max Krasnyansky 609ea50026
hexagon: Q4_0 and MXFP4 repack fixes (#20527)
* hexagon: fix tail corruption with rows sizes not multiple of 256

* hexagon: use different stride for repacking partial blocks

* hex-mm: update repack and kernels to avoid shuffles for full 256-element blocks

Previous commit changed the repacking to use even:odd (0:1,2:3,..) packing
instead of the original (0:128,1:129,...) packing in order to fix tail corruption.
Since the mm kernels already deal with partial tails we can use even:odd
packing only for the last block.
This avoid performance penalty of having to shuffle to zip the elements
in the common case.

* hex-mm: update rmpy x8 for better optimizations

* hex-mm: tighten supported MUL_MAT checks to avoid spurios failures

* hex-mm: use vzero to init accumulators

* hex-mm: properly call partial rmpy_x8
2026-03-14 11:09:08 -07:00
..
htp hexagon: Q4_0 and MXFP4 repack fixes (#20527) 2026-03-14 11:09:08 -07:00
CMakeLists.txt ggml-hexagon: flash-attention and reduce-sum optimizations (#19141) 2026-01-30 21:14:20 -08:00
ggml-hexagon.cpp hexagon: Q4_0 and MXFP4 repack fixes (#20527) 2026-03-14 11:09:08 -07:00
htp-drv.cpp chore : correct typos [no ci] (#20041) 2026-03-05 08:50:21 +01:00
htp-drv.h hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
libdl.h hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
libggml-htp.inf hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
op-desc.h ggml-hexagon: create generalized functions for cpu side op (#17500) 2025-12-22 23:13:24 -08:00