llama.cpp

History

iacopPBK 495c363267 ds_read_b128 for q4_0 and q4_1 mmq kernels Current for loop generates ds_read_b32 instructions with hip compiler, the new solution generates ds_read_b128 instructions for the same operation, saving some LDS bandwidth. Tested on MI50 and RX6800XT, its faster on both.		2026-03-30 02:09:24 +02:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	llama: fix llama-model-saver (#20503 )	2026-03-25 12:53:16 +02:00
src	ds_read_b128 for q4_0 and q4_1 mmq kernels	2026-03-30 02:09:24 +02:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : bump version to 0.9.8 (ggml/1442)	2026-03-18 15:17:28 +02:00