llama.cpp/ggml/src/ggml-vulkan
Georgi Gerganov b4ae50810e
metal : improve FA + improve MoE (#12612)
* ggml : FA with different K, V head sizes (CPU)

ggml-ci

* metal : add FA with HS=192

* metal : extend FA to support different K and V head sizes

ggml-ci

* metal : add FA vector kernels for heads K 192 and V 128

ggml-ci

* ggml : restrict op on other backends to equal head sizes

ggml-ci

* metal : optimize FA-vec kernel

ggml-ci

* metal : FA remove mq registers

* metal : improve MoE mul_mat_id condition

ggml-ci

* metal : fix comments + remove unnecessary addition

ggml-ci

* metal : avoid too much shared memory usage with mul_mat_id

ggml-ci
2025-03-28 20:21:59 +02:00
..
cmake fix: ggml: fix vulkan-shaders-gen build (#10448) 2025-01-15 14:17:42 +01:00
vulkan-shaders vulkan: fix coopmat shader generation when cross-compiling (#12272) 2025-03-28 14:51:06 -03:00
CMakeLists.txt vulkan: fix coopmat shader generation when cross-compiling (#12272) 2025-03-28 14:51:06 -03:00
ggml-vulkan.cpp metal : improve FA + improve MoE (#12612) 2025-03-28 20:21:59 +02:00