llama.cpp

History

Jeff Bolz 18ddaea2ae vulkan: Optimize GGML_OP_CUMSUM (#18417 ) * vulkan: Optimize GGML_OP_CUMSUM There are two paths: The preexisting one that does a whole row per workgroup in a single shader, and one that splits each row into multiple blocks and does two passes. The first pass computes partials within a block, the second adds the block partials to compute the final result. The multipass shader is used when there are a small number of large rows. In the whole-row shader, handle multiple elements per invocation. * use 2 ELEM_PER_THREAD for AMD/Intel * address feedback		2026-01-02 15:32:30 -06:00
..
cmake	cmake: fix ggml-shaders-gen compiler paths containing spaces (#12747 )	2025-04-04 10:12:40 -03:00
vulkan-shaders	vulkan: Optimize GGML_OP_CUMSUM (#18417 )	2026-01-02 15:32:30 -06:00
CMakeLists.txt	vulkan: Improve build time for MSVC (#16545 )	2025-10-14 14:51:36 +02:00
ggml-vulkan.cpp	vulkan: Optimize GGML_OP_CUMSUM (#18417 )	2026-01-02 15:32:30 -06:00