llama.cpp

History

Alberto Cabrera Pérez be8890e721 ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888 ) * Boilerplate for q6_K repack * q6_K repack to q6_Kx8 implementation Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * q6_K generic gemv and gemm * wip, gemm_q6_K 8x8 * Still WIP: loading of q8s, q6h and q6l * first working version of q6_K gemm * Moved q6 loads outside of sb block, Unrolled inner loop * Replaced modulo with mask * First implementation of GEMV * ggml_vdotq_s32 -> vdotq_s32 * Reduce width of accumulators in q6_K gemv * Bsums instead of calc bias. Preload scales to use vget_lane. Unroll. * Reuse scales in GEMM (same GEMV opt) * Added todos for bsum and different qh repack * Arch fallback * VSLIQ for merging qh adn ql * Removed TODO, already tested * Apply suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Removed unused import --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2026-01-27 11:08:10 +02:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	ggml : add ggml_build_forward_select (#18550 )	2026-01-19 20:03:19 +02:00
src	ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888 )	2026-01-27 11:08:10 +02:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : bump version to 0.9.5 (ggml/1410)	2025-12-31 18:54:43 +02:00