llama.cpp/ggml/src/ggml-vulkan
Jeff Bolz f01bd02376
vulkan: Implement split_k for coopmat2 flash attention. (#12627)
When using group query attention, we have one workgroup per KV batch and this
can be very few workgroups (e.g. just 8 in some models). Enable split_k to
spread the work across SMs. This helps a lot when the KV cache is large.
2025-04-02 14:25:08 -05:00
..
cmake fix: ggml: fix vulkan-shaders-gen build (#10448) 2025-01-15 14:17:42 +01:00
vulkan-shaders vulkan: Implement split_k for coopmat2 flash attention. (#12627) 2025-04-02 14:25:08 -05:00
CMakeLists.txt cmake: remove caching from vulkan coopmat checks (#12719) 2025-04-02 14:56:26 -03:00
ggml-vulkan.cpp vulkan: Implement split_k for coopmat2 flash attention. (#12627) 2025-04-02 14:25:08 -05:00