llama.cpp/ggml
Jeff Bolz 303f8615e9
vulkan: Multi-pass softmax for large number of cols (#17892)
When the number of cols is large, split each row across multiple workgroups.
There are three phases that communicate partial results through temp buffers:
(1) compute max partials
(2) take max of partials, compute sum(exp(x-max)) partials
(3) sum partials, compute scaled result
2025-12-13 10:04:29 +01:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (#17951) 2025-12-12 16:26:03 +02:00
src vulkan: Multi-pass softmax for large number of cols (#17892) 2025-12-13 10:04:29 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (#17784) 2025-12-08 10:41:34 +02:00