llama.cpp

History

Georgi Gerganov b4ae50810e metal : improve FA + improve MoE (#12612 ) * ggml : FA with different K, V head sizes (CPU) ggml-ci * metal : add FA with HS=192 * metal : extend FA to support different K and V head sizes ggml-ci * metal : add FA vector kernels for heads K 192 and V 128 ggml-ci * ggml : restrict op on other backends to equal head sizes ggml-ci * metal : optimize FA-vec kernel ggml-ci * metal : FA remove mq registers * metal : improve MoE mul_mat_id condition ggml-ci * metal : fix comments + remove unnecessary addition ggml-ci * metal : avoid too much shared memory usage with mul_mat_id ggml-ci		2025-03-28 20:21:59 +02:00
..
cmake	fix: ggml: fix vulkan-shaders-gen build (#10448 )	2025-01-15 14:17:42 +01:00
vulkan-shaders	vulkan: fix coopmat shader generation when cross-compiling (#12272 )	2025-03-28 14:51:06 -03:00
CMakeLists.txt	vulkan: fix coopmat shader generation when cross-compiling (#12272 )	2025-03-28 14:51:06 -03:00
ggml-vulkan.cpp	metal : improve FA + improve MoE (#12612 )	2025-03-28 20:21:59 +02:00