llama.cpp

History

Ruben Ortlam 75f3bc94e6 vulkan: Flash Attention DP4A shader for quantized KV cache (#20797 ) * use integer dot product for quantized KV flash attention * small improvements * fix SHMEM_STAGING indexing * add missing KV type quants * fixes * add supported quants to FA tests * readd fast paths for <8bit quants * fix mmq gate and shmem checks		2026-04-13 14:21:31 +02:00
..
cmake	cmake: fix ggml-shaders-gen compiler paths containing spaces (#12747 )	2025-04-04 10:12:40 -03:00
vulkan-shaders	vulkan: Flash Attention DP4A shader for quantized KV cache (#20797 )	2026-04-13 14:21:31 +02:00
CMakeLists.txt	chore : correct typos [no ci] (#20041 )	2026-03-05 08:50:21 +01:00
ggml-vulkan.cpp	vulkan: Flash Attention DP4A shader for quantized KV cache (#20797 )	2026-04-13 14:21:31 +02:00