llama.cpp/ggml/src/ggml-cpu/kleidiai
Justin Bradford 627670601a
kleidiai : fix MUL_MAT support for batched (3D) inputs (#20620)
* kleidiai : fix MUL_MAT support for batched (3D) inputs

The supports_op() check incorrectly rejected MUL_MAT operations with 3D
inputs (ne[2] > 1), but the actual compute_forward_qx() implementation
handles batched inputs correctly via a loop over ne12.

This caused models with Q4_0/Q8_0 weights to crash during graph scheduling
when n_seq_max > 1, because weights were placed in KLEIDIAI buffers during
loading (tested with 2D inputs) but the runtime used 3D inputs.

Also relax the buffer check to allow supports_op() to be called during
weight loading when src[0]->buffer is NULL.

Fixes #20608

* Kleidiai support_ops should only return true for 3D inputs, not also 4D
2026-03-17 14:03:54 +02:00
..
kernels.cpp kleidiai : support for concurrent sme and neon kernel execution (#20070) 2026-03-10 09:25:25 +02:00
kernels.h kleidiai: add optimized per-channel kernels for Q8_0 (#16993) 2025-11-11 13:20:31 +02:00
kleidiai.cpp kleidiai : fix MUL_MAT support for batched (3D) inputs (#20620) 2026-03-17 14:03:54 +02:00
kleidiai.h ggml-cpu: Add CPU backend support for KleidiAI library (#11390) 2025-02-20 15:06:51 +02:00