llama.cpp/ggml
Jeff Bolz 2960eb2975
vulkan: Use one row per workgroup for f32 mmv (#17711)
The MoE models have a mul_mat_vec with very small m (32, 64, 128) right before
the topk_moe selection. Running multiple rows per wg doesn't utilize the SMs
well. I think even for larger m, f32 is so bandwidth-limited that running
multiple rows doesn't help.
2025-12-06 11:12:26 +01:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include rpc : fix alloc size logic (#17116) 2025-12-05 19:39:04 +02:00
src vulkan: Use one row per workgroup for f32 mmv (#17711) 2025-12-06 11:12:26 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt build : move _WIN32_WINNT definition to headers (#17736) 2025-12-04 07:04:02 +01:00