llama.cpp

History

Jeff Bolz 6efcd65945 vulkan: optimize flash attention split_k_reduce (#14554 ) * vulkan: allow FA split_k with smaller KV values * vulkan: spread split_k_reduce work across more threads k_num can get rather large. Use the whole workgroup to reduce the M/L values. Launch a thread for each element in the HSV dimension of the output. Helps a lot for large HSV (like deepseek).		2025-07-08 20:11:42 +02:00
..
cmake	ggml-cpu : rework weak alias on apple targets (#14146 )	2025-06-16 13:54:15 +08:00
include	CUDA: add bilinear interpolation for upscale (#14563 )	2025-07-08 10:11:18 +08:00
src	vulkan: optimize flash attention split_k_reduce (#14554 )	2025-07-08 20:11:42 +02:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : remove kompute backend (#14501 )	2025-07-03 07:48:32 +03:00