llama.cpp

History

Jeff Bolz 2960eb2975 vulkan: Use one row per workgroup for f32 mmv (#17711 ) The MoE models have a mul_mat_vec with very small m (32, 64, 128) right before the topk_moe selection. Running multiple rows per wg doesn't utilize the SMs well. I think even for larger m, f32 is so bandwidth-limited that running multiple rows doesn't help.		2025-12-06 11:12:26 +01:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	rpc : fix alloc size logic (#17116 )	2025-12-05 19:39:04 +02:00
src	vulkan: Use one row per workgroup for f32 mmv (#17711 )	2025-12-06 11:12:26 +01:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	build : move _WIN32_WINNT definition to headers (#17736 )	2025-12-04 07:04:02 +01:00