llama.cpp

History

Jeff Bolz 61bde8e21f vulkan: Reduce temporary memory usage for TOP_K (#17623 ) - Compute row size for the temp buffer based on the output of the first pass. - Update shader addressing math to use the output row size - Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k" For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer from about 3.2MB to 500KB.		2025-12-02 19:22:04 +01:00
..
cmake	cmake: fix ggml-shaders-gen compiler paths containing spaces (#12747 )	2025-04-04 10:12:40 -03:00
vulkan-shaders	vulkan: Reduce temporary memory usage for TOP_K (#17623 )	2025-12-02 19:22:04 +01:00
CMakeLists.txt	vulkan: Improve build time for MSVC (#16545 )	2025-10-14 14:51:36 +02:00
ggml-vulkan.cpp	vulkan: Reduce temporary memory usage for TOP_K (#17623 )	2025-12-02 19:22:04 +01:00