llama.cpp

History

Jeff Bolz f01bd02376 vulkan: Implement split_k for coopmat2 flash attention. (#12627 ) When using group query attention, we have one workgroup per KV batch and this can be very few workgroups (e.g. just 8 in some models). Enable split_k to spread the work across SMs. This helps a lot when the KV cache is large.		2025-04-02 14:25:08 -05:00
..
cmake	fix: ggml: fix vulkan-shaders-gen build (#10448 )	2025-01-15 14:17:42 +01:00
vulkan-shaders	vulkan: Implement split_k for coopmat2 flash attention. (#12627 )	2025-04-02 14:25:08 -05:00
CMakeLists.txt	cmake: remove caching from vulkan coopmat checks (#12719 )	2025-04-02 14:56:26 -03:00
ggml-vulkan.cpp	vulkan: Implement split_k for coopmat2 flash attention. (#12627 )	2025-04-02 14:25:08 -05:00