llama.cpp/ggml/src/ggml-hexagon
Max Krasnyansky 0ccbfdef3e
hexagon: further optimizations and refactoring for flash attention (#19583)
* ggml-hexagon: fa improvements

ggml-hexagon: optimize flash attention calculations with improved variable handling

ggml-hexagon: streamline flash attention operations by removing redundant checks for FP32

ggml-hexagon: optimize hvx_dot_f16_f16_aa_rx2 by simplifying variable handling for unused elements

ggml-hexagon: optimize flash attention by changing slope vector type to F16

* hexfa: fixed test-backend-ops failurs due to leftover element handling

* hexagon: refactor and optimize fa to use local context struct

* ggml-hexagon: optimize flash-attention using hvx_vec_expf

Use HVX for online softmax.

---------

Co-authored-by: chraac <chraac@gmail.com>
2026-02-13 16:27:30 -08:00
..
htp hexagon: further optimizations and refactoring for flash attention (#19583) 2026-02-13 16:27:30 -08:00
CMakeLists.txt ggml-hexagon: flash-attention and reduce-sum optimizations (#19141) 2026-01-30 21:14:20 -08:00
ggml-hexagon.cpp hexagon: Add ARGSORT, DIV, SQR, SQRT, SUM_ROWS, GEGLU (#19406) 2026-02-10 23:21:12 -08:00
htp-drv.cpp hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
htp-drv.h hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
libdl.h hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
libggml-htp.inf hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
op-desc.h ggml-hexagon: create generalized functions for cpu side op (#17500) 2025-12-22 23:13:24 -08:00