llama.cpp

History

yulo ea4a321f2a HIP: add fattn-mma-f16 for RDNA4 (#18481 ) * finish VQ mma * flash_attn_ext_f16_iter * KQ_rowsum * correct exp * fix scale error * fix softmax scale * fix softmax scale * enable fattn on cpu side * fix random error * disable fattn-mma-f16 on rdna3 * fix wrong col for rdna * use identity mat to transpose * resolve conflicts * basic tuning for DeepSeek-R1-Distill-Qwen-1.5B * fix volta compile error * align rdna4 policy for fattn * adjust fattn policy * adjust kernel selection logic * update as the review comments * keep fattn-wmma logic * adjust kernel selection logic --------- Co-authored-by: zhang hui <you@example.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>		2026-01-13 13:52:16 +01:00
..
cuda.h	CUDA: experimental native mxfp4 support for blackwell (#17906 )	2025-12-24 22:28:26 +08:00
hip.h	HIP: add fattn-mma-f16 for RDNA4 (#18481 )	2026-01-13 13:52:16 +01:00
musa.h	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00