llama.cpp/ggml/src/ggml-vulkan
Jeff Bolz 18ddaea2ae
vulkan: Optimize GGML_OP_CUMSUM (#18417)
* vulkan: Optimize GGML_OP_CUMSUM

There are two paths: The preexisting one that does a whole row per workgroup
in a single shader, and one that splits each row into multiple blocks and does
two passes. The first pass computes partials within a block, the second adds
the block partials to compute the final result. The multipass shader is used
when there are a small number of large rows.

In the whole-row shader, handle multiple elements per invocation.

* use 2 ELEM_PER_THREAD for AMD/Intel

* address feedback
2026-01-02 15:32:30 -06:00
..
cmake cmake: fix ggml-shaders-gen compiler paths containing spaces (#12747) 2025-04-04 10:12:40 -03:00
vulkan-shaders vulkan: Optimize GGML_OP_CUMSUM (#18417) 2026-01-02 15:32:30 -06:00
CMakeLists.txt vulkan: Improve build time for MSVC (#16545) 2025-10-14 14:51:36 +02:00
ggml-vulkan.cpp vulkan: Optimize GGML_OP_CUMSUM (#18417) 2026-01-02 15:32:30 -06:00