llama.cpp

History

Progeny Alpha 313ef74afe vulkan: add coopmat GEMM output kernel for chunked GDN Add gated_delta_net_chunk_output_cm1.comp — a cooperative matrix variant of the chunked output kernel that replaces the O(N²) scalar intra-chunk loop with an f16 coopmat GEMM: A_decayed[64×64] @ vnew[64×128]. Kernel structure: - Phase 1: Q@K^T via coopmat (unchanged from scalar variant) - Phase 2a: Build causal decay mask → sh_adecay (f16, clamped) - Phase 2b: Stage vnew into sh_kv (f16, pre-scaled by 1/√d) - Pass 1: Inter-chunk Q@S → dst (scalar, 128 threads) - Pass 2: Intra-chunk coopmat GEMM (full chunks) or scalar fallback (partial last chunk). 3 barriers total, 62.7KB shared memory. Pipeline registered but not yet dispatched (threshold remains disabled). Test tolerance bumped to 5e-3 for n_seq_tokens≥64 to account for f16 intermediate precision in the coopmat path. 16/16 backend tests pass.		2026-03-13 21:45:42 -04:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	llama : enable chunked fused GDN path (#20340 )	2026-03-11 22:46:40 +02:00
src	vulkan: add coopmat GEMM output kernel for chunked GDN	2026-03-13 21:45:42 -04:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : fix typo gmml (#20512 )	2026-03-13 14:36:13 +01:00