llama.cpp

History

Jeff Bolz 1946e46f4c vulkan: For coopmat2 FA, use fp16 accumulators for the final result (#19376 ) The cpu and cuda backends use fp16 for the VKQ accumulator type, this change does the same for vulkan. This helps particularly with large head sizes which are very register-limited. I tried this for the coopmat1 path and it slowed down a bit. I didn't try for scalar. I applied the softmax bias that the cuda backend uses to avoid overflow, although I was not able to reproduce the original bug without it.		2026-02-06 09:15:13 +01:00
..
cmake	cmake: fix ggml-shaders-gen compiler paths containing spaces (#12747 )	2025-04-04 10:12:40 -03:00
vulkan-shaders	vulkan: For coopmat2 FA, use fp16 accumulators for the final result (#19376 )	2026-02-06 09:15:13 +01:00
CMakeLists.txt	vulkan: Improve build time for MSVC (#16545 )	2025-10-14 14:51:36 +02:00
ggml-vulkan.cpp	vulkan: make FA mask/softcap enables spec constants (#19309 )	2026-02-06 08:49:58 +01:00