llama.cpp/ggml/src/ggml-vulkan
Ruben Ortlam 75f3bc94e6
vulkan: Flash Attention DP4A shader for quantized KV cache (#20797)
* use integer dot product for quantized KV flash attention

* small improvements

* fix SHMEM_STAGING indexing

* add missing KV type quants

* fixes

* add supported quants to FA tests

* readd fast paths for <8bit quants

* fix mmq gate and shmem checks
2026-04-13 14:21:31 +02:00
..
cmake cmake: fix ggml-shaders-gen compiler paths containing spaces (#12747) 2025-04-04 10:12:40 -03:00
vulkan-shaders vulkan: Flash Attention DP4A shader for quantized KV cache (#20797) 2026-04-13 14:21:31 +02:00
CMakeLists.txt chore : correct typos [no ci] (#20041) 2026-03-05 08:50:21 +01:00
ggml-vulkan.cpp vulkan: Flash Attention DP4A shader for quantized KV cache (#20797) 2026-04-13 14:21:31 +02:00