llama.cpp

Commit Graph

Author	SHA1	Message	Date
Progeny Alpha	530e5bb117	vulkan: fuse w/k_gated broadcasts in chunked inter kernel Load both s_w and s_kg before the first barrier instead of using separate barriers for each. Reduces per-token barriers from 3 to 2, eliminating 64 barriers per chunk. GDN per-op: 6818 → 5205 µs (-23.6%). 16/16 tests pass.	2026-03-14 22:32:46 -04:00
Progeny Alpha	e22c2b2c85	vulkan: clean up chunked GDN shaders for PR review Remove verbose algorithm comments, section dividers, stale inline constant annotations, and unused extensions. Match llama.cpp codebase style (minimal comments, no section decorators). No functional changes. 16/16 tests pass.	2026-03-14 03:49:27 -04:00
Progeny Alpha	d2fabedf09	vulkan: fix chunked inter kernel state layout for PR #20443 PR #20443 removed redundant state transposes from the graph and updated the autoregressive shader to use colS_V+i (coalesced) instead of iS_V+col (strided). The chunked inter kernel was not updated, causing uncoalesced state reads and a ~8% PP regression. Fix state_in load and final_out write to match the new layout. h_snapshots (h_out/h_in) are internal scratch and keep their existing layout since inter and output kernels agree. PP-512: 202 → 218 t/s. 16/16 tests pass.	2026-03-13 23:34:59 -04:00
Progeny Alpha	949a7e86d3	vulkan: add chunked parallel kernel infrastructure for GATED_DELTA_NET Three-dispatch chunked pipeline for prompt processing acceleration: intra-chunk WY decomposition, inter-chunk state propagation, output combination. Currently disabled (threshold=UINT32_MAX). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 21:45:42 -04:00

4 Commits