llama.cpp

History

kangletian 49a5ff40e2 mmvq: add RDNA4-specific parameter table (nwarps=8, rows=1) Add a dedicated MMVQ_PARAMETERS_RDNA4 entry separate from RDNA2/RDNA3. For bs=1 decode on RDNA4 (gfx1201), optimal config is nwarps=8 rows=1: - 8 warps × 32 threads = 256 threads per block - blocks_per_iter = vdrnwarpswarp_size/qi = 2832/4 = 128 - For K=4096: blocks_per_row=128, entire K dimension in single iteration - Maximizes memory-level parallelism on RDNA4 Benchmark (Llama 2 7B Q4_0, AMD Radeon AI PRO R9700): Master: 95.05 tok/s (tg128) nwarps=8: 104.82 tok/s (tg128) → +10.3% pp512: no regression (1448 vs 1449 tok/s)		2026-02-11 09:44:01 +00:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	ggml-virtgpu: make the code thread safe (#19204 )	2026-02-04 10:46:18 +08:00
src	mmvq: add RDNA4-specific parameter table (nwarps=8, rows=1)	2026-02-11 09:44:01 +00:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	Bump cmake max version (needed for Windows on Snapdragon builds) (#19188 )	2026-02-01 14:13:38 -08:00