llama.cpp/ggml
Ruben Ortlam bcf5bda6f5
Vulkan MMQ Integer Dot Refactor and K-Quant support (#16536)
* vulkan: add mmq q2_k integer dot support

* Refactor mmq caching

* Reduce mmq register use

* Load 4 quant blocks into shared memory in one step

* Pack q2_k blocks into caches of 32

* Use 32-bit accumulators for integer dot matmul

* Add q4_k mmq

* Add q3_k mmq

* Add q5_k mmq

* Add q6_k mmq

* Add mxfp4 mmq, enable MMQ MUL_MAT_ID

* Fix mmv dm loads
2025-10-29 14:39:03 +01:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include Add experimental ggml-hexagon backend for the Hexagon NPU (#16547) 2025-10-22 13:47:09 -07:00
src Vulkan MMQ Integer Dot Refactor and K-Quant support (#16536) 2025-10-29 14:39:03 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt Add experimental ggml-hexagon backend for the Hexagon NPU (#16547) 2025-10-22 13:47:09 -07:00