* kleidiai : fix MUL_MAT support for batched (3D) inputs
The supports_op() check incorrectly rejected MUL_MAT operations with 3D
inputs (ne[2] > 1), but the actual compute_forward_qx() implementation
handles batched inputs correctly via a loop over ne12.
This caused models with Q4_0/Q8_0 weights to crash during graph scheduling
when n_seq_max > 1, because weights were placed in KLEIDIAI buffers during
loading (tested with 2D inputs) but the runtime used 3D inputs.
Also relax the buffer check to allow supports_op() to be called during
weight loading when src[0]->buffer is NULL.
Fixes#20608
* Kleidiai support_ops should only return true for 3D inputs, not also 4D
* kleidiai: add data type check to get_tensor_traits
* Added check for F16 data type into get_tensor_traits path with input data
not in ggml_backend_cpu_kleidiai_buffer_type format (unsupported for Q4/8)
Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Change-Id: I9aca4b9b8d669d35db6f1dbcc4e080b1919b1de7
* updated ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
updated kleidiai.cpp file as per suggestion
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel
Signed-off-by: Dan Johansson <dan.johansson@arm.com>
* * code review fixes
Signed-off-by: Dan Johansson <dan.johansson@arm.com>
* * adds a comment that clarifies barrier usage
Signed-off-by: Dan Johansson <dan.johansson@arm.com>
---------
Signed-off-by: Dan Johansson <dan.johansson@arm.com>
Co-authored-by: Charles Xu <charles.xu@arm.com>
* ggml-cpu: Add CPU backend support for KleidiAI library
* Add environmental variable GGML_KLEIDIAI_SME
* Add support for multithread LHS conversion
* Switch kernel selection order to dotprod and i8mm
* updates for review comments
* More updates for review comments
* Reorganize and rename KleidiAI files
* Move ggml-cpu-traits.h to source file
* Update cmake for SME build and add alignment for SME
* Remove append GGML_USE_CPU_KLEIDIAI to the GGML_CDEF_PUBLIC list