When using group query attention, we have one workgroup per KV batch and this can be very few workgroups (e.g. just 8 in some models). Enable split_k to spread the work across SMs. This helps a lot when the KV cache is large. |
||
|---|---|---|
| .. | ||
| cmake | ||
| vulkan-shaders | ||
| CMakeLists.txt | ||
| ggml-vulkan.cpp | ||