* Updated repack.cpp
* Updated repack.cpp
* Updated repack.cpp
* Added if condition to support only vector length 256.
* Changed the format removed comments and duplicate variable
* If SVE 256 not present then was using generic function to compute, hence slowing the performance.
So added code if SVE 256 is not present then use NEON code.
* Code format change suggestion
---------
Co-authored-by: Vithule, Prashant <Prashant.Vithule@fujitsu.com>
* cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization
- load all 8 int8 for a grid position in one load
- calculate signs via popcnt instead of fetching from ksigns table
- broadcast signs to drop individual shift/mask
* cuda: iq2xxs: simplify sum scaling
express `(sum * scale + sum / 2) / 4` as `(sum * (scale * 2 + 1)) / 8`
express `((aux32 >> 28) * 2 + 1)` as `(aux32 >> 27 | 1)`
saves 3 registers for mul_mat_vec_q (152 -> 149) according to nsight
AFAICT no overflow can occur here as iq2xxs values are far too small
* uint -> uint32_t
error: identifier "uint" is undefined
This commit addresses a build issue with the KleidiAI backend when
building multiple cpu backends. Commmit
3a00c98584 ("cmake : fix KleidiAI install
target failure with EXCLUDE_FROM_ALL") introduced a change where
FetchContent_Populate is called instead of FetchContent_MakeAvailable,
where the latter does handle this case (it is idempotent but
FetchContent_Populate is not).
I missed this during my review and I should not have commited without
verifying the CI failure, sorry about that.
* ggml-cpu: FA add GEMM microkernel
* add guard for sizeless vector types
* fix case where DV % GGML_F32_EPR !=0
* move memset out of the loop
* move another memset out of the loop
* use RM=4 for arm
* simd_gemm: convert everything to int
* convert everything to size_t to avoid warnings
* fixup
* add pragma for ignoring aggressive loop optimizations
* cmake: fix KleidiAI install target failure with EXCLUDE_FROM_ALL
Fix for the bug #19501 by adding EXCLUDE_FROM_ALL to FetchContent_Declare. This properly excludes KleidiAI from both build and install targets, preventing install failures when GGML_CPU_KLEIDIAI=ON is used.
The KleidiAI source files are still compiled into libggml-cpu.so, preserving all functionality.
* addressed code review comments
last_graph is only available without OpenMP, but
ggml_graph_compute_thread() is called in both cases.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* ggml-hexagon: fa improvements
ggml-hexagon: optimize flash attention calculations with improved variable handling
ggml-hexagon: streamline flash attention operations by removing redundant checks for FP32
ggml-hexagon: optimize hvx_dot_f16_f16_aa_rx2 by simplifying variable handling for unused elements
ggml-hexagon: optimize flash attention by changing slope vector type to F16
* hexfa: fixed test-backend-ops failurs due to leftover element handling
* hexagon: refactor and optimize fa to use local context struct
* ggml-hexagon: optimize flash-attention using hvx_vec_expf
Use HVX for online softmax.
---------
Co-authored-by: chraac <chraac@gmail.com>