llama.cpp/ggml/src
Progeny Alpha b0323615c9 vulkan: fused inter+output kernel for chunked GDN
Merge the inter-chunk state propagation and output computation into a
single dispatch, reducing the chunked pipeline from 3 dispatches to 2.

State lives in registers across the sequential chunk loop. vnew is
computed in-kernel and passed to the coopmat GEMM via shared memory
(f16, packed with subgroup shuffles). This eliminates the VNew scratch
buffer (wu_size) and H_snapshots buffer (h_size) — ~786KB/head/seq
saved for PP-512.

Architecture per chunk:
  Step 1: Load K, Q, gcum → shared (all 256 threads)
  Step 2: Q@K^T coopmat → sh_attn (all 256 threads)
  Step 3: Decay mask + O_inter = Q@state → dst (parallel)
  Step 4: vnew = U - W@state → sh_kv (128 threads + k_gated assist)
  Step 5: O_intra = A_decayed @ vnew coopmat GEMM → dst
  Step 6: state = exp(decay) * state + delta

Shared memory: 63,744 / 65,536 bytes. 16/16 backend tests pass.
2026-03-13 21:45:42 -04:00
..
ggml-blas ggml: update comments for backends which have no memory to report (#20157) 2026-03-06 23:24:38 +08:00
ggml-cann CANN: Remove unnecessary wrapper for `gml_backend_buft_is_cann` (#18968) 2026-02-10 14:19:30 +08:00
ggml-cpu graph : remove redundant GDN state transposes (#20443) 2026-03-13 22:12:54 +02:00
ggml-cuda graph : remove redundant GDN state transposes (#20443) 2026-03-13 22:12:54 +02:00
ggml-hexagon hexagon: add f32 ssm_conv op (#20122) 2026-03-06 09:59:26 -08:00
ggml-hip hip: compile debug builds with -O2 on hip to avoid a compiler bug (#20392) 2026-03-12 10:37:10 +08:00
ggml-metal graph : remove redundant GDN state transposes (#20443) 2026-03-13 22:12:54 +02:00
ggml-musa CUDA: faster tile FA, add oob checks, more HSs (#16492) 2025-10-11 20:54:32 +02:00
ggml-opencl opencl: use larger workgroup size for get_rows (#20316) 2026-03-11 22:03:27 -07:00
ggml-rpc rpc : use unordered_map::reserve and emplace (#18513) 2026-01-02 12:09:36 +02:00
ggml-sycl fix op rope, add rope_back (#20293) 2026-03-11 09:53:34 +08:00
ggml-virtgpu ggml-virtgpu: improve the reliability of the code (#19846) 2026-02-26 20:00:57 +08:00
ggml-vulkan vulkan: fused inter+output kernel for chunked GDN 2026-03-13 21:45:42 -04:00
ggml-webgpu ggml-webgpu: Add supports for `GGML_OP_REPEAT` (#20230) 2026-03-11 14:40:36 -07:00
ggml-zdnn ggml-zdnn : mark zDNN buffers as non-host (#18967) 2026-01-22 01:16:21 +01:00
ggml-zendnn ggml-zendnn: update code for latest ZenDNN API (#19923) 2026-02-27 08:43:41 +08:00
CMakeLists.txt hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
ggml-alloc.c ggml : make `ggml_is_view` as API (#19539) 2026-02-16 17:43:34 +02:00
ggml-backend-dl.cpp hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
ggml-backend-dl.h hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
ggml-backend-impl.h llama: use host memory if device reports 0 memory (#18587) 2026-01-09 05:34:56 +08:00
ggml-backend-reg.cpp ggml : use noexcept overload for is_regular_file in backend registration (#19452) 2026-02-10 10:57:48 +01:00
ggml-backend.cpp llama : disable graph reuse with pipeline parallelism (#20463) 2026-03-12 21:04:13 +02:00
ggml-common.h ggml : add NVFP4 quantization type support (#19769) 2026-03-11 21:02:54 +01:00
ggml-impl.h ggml : add NVFP4 quantization type support (#19769) 2026-03-11 21:02:54 +01:00
ggml-opt.cpp finetune: SGD optimizer, more CLI args (#13873) 2025-08-14 12:03:57 +02:00
ggml-quants.c ggml : add NVFP4 quantization type support (#19769) 2026-03-11 21:02:54 +01:00
ggml-quants.h ggml : add NVFP4 quantization type support (#19769) 2026-03-11 21:02:54 +01:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) 2024-12-12 19:02:49 +01:00
ggml.c ggml : add NVFP4 quantization type support (#19769) 2026-03-11 21:02:54 +01:00
ggml.cpp ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) 2025-06-01 13:43:57 +03:00
gguf.cpp gguf : avoid too many file size calls (#19919) 2026-02-26 12:46:32 +02:00