llama.cpp/ggml
Progeny Alpha 530e5bb117 vulkan: fuse w/k_gated broadcasts in chunked inter kernel
Load both s_w and s_kg before the first barrier instead of using
separate barriers for each. Reduces per-token barriers from 3 to 2,
eliminating 64 barriers per chunk.

GDN per-op: 6818 → 5205 µs (-23.6%). 16/16 tests pass.
2026-03-14 22:32:46 -04:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include llama : enable chunked fused GDN path (#20340) 2026-03-11 22:46:40 +02:00
src vulkan: fuse w/k_gated broadcasts in chunked inter kernel 2026-03-14 22:32:46 -04:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml : fix typo gmml (#20512) 2026-03-13 14:36:13 +01:00