Commit Graph

279 Commits

Author SHA1 Message Date
Charles Xu 2d6c00a9b8
kleidiai: add and integrate SVE 256-bit vector-length kernel (#18458)
* kleidiai: add and integrate SVE 256-bit vector-length kernel

* updated for review comments
2025-12-30 14:04:53 +02:00
Boian Berberov 94de74e7b1
cmake: Added more x86_64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` (#18186)
* minor: Consolidated `#include <immintrin.h>` under `ggml-cpu-impl.h`

* cmake: Added more x86-64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On`

- `ivybridge`
- `piledriver`
- `cannonlake`
- `cascadelake`
- `cooperlake`
- `zen4`

Resolves: #17966
2025-12-28 09:33:29 +02:00
Taimur Ahmad d34d5ca1e9
llamafile: add rvv support for sgemm kernels (#18199)
Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
2025-12-22 20:20:23 +02:00
Taimur Ahmad f716588e63
ggml-cpu: extend support for RVV floating-point kernels (#17318)
* cmake: add BF16 RVV flag for ggml-cpu

* ggml-cpu: add floating-point conversion kernels

* ggml: add floating-point kernels

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>

* ggml-cpu: fix lmul in vec_dot_bf16

* ggml-cpu: change redsum to lmul 4, fix leftover

---------

Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>
2025-12-18 16:02:09 +02:00
Alberto Cabrera Pérez 669696e00d
ggml-cpu: ARM64: repack version of q8_0 (dotprod and i8mm) (#18096)
* wip: skeleton for q8_0 repack

* q8_0 repack GEMV implementations

* GEMM implementations

* Formatting

* Fixed format consistency of repack gemm and gemv declarations

* gemv and gemm generic location consistent with declarations

* Removed non-correct unused variables statements

* Cleanup, consistent style

* Missing generic fallbacks for x86 and powerpc
2025-12-17 13:39:13 +02:00
Georgi Gerganov a63cbafbbc ggml : arm repack fix build 2025-12-14 08:33:51 +02:00
Georgi Gerganov 71fdcf0616 ggml : arm repack fix build (whisper/0) 2025-12-14 08:33:51 +02:00
ixgbe 51604435e8
ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (#17951)
* ggml-cpu:fix RISC-V Q4_0 repack select and RVV feature reporting

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>

* using the name VLEN instead of CNT

* Update ggml/include/ggml-cpu.h

---------

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-12 16:26:03 +02:00
Max Krasnyansky e1f4921980
Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748)
* tests: update barrier test to check for race condition in active threads

* cpu: combine n_graph and n_threads into a single atomic update

* tests: add multi-graph test for test_barrier
2025-12-10 12:32:23 -08:00
ixgbe 79d61896d3
ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (#17784)
* ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>

* cmake: enable RISC-V zihintpause extension for Spacemit builds

* readme : add ZIHINTPAUSE support for RISC-V

---------

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-12-08 10:41:34 +02:00
Phylliida Dev 09c7c50e64
ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985)
* Feat: Added vulkan circular tiling support

* Feat: Added cpu circular

* Feat: Added cuda kernels

* Added tests

* Added tests

* Removed non-pad operations

* Removed unneded changes

* removed backend non pad tests

* Update test-backend-ops.cpp

* Fixed comment on pad test

* removed trailing whitespace

* Removed unneded test in test-backend-ops

* Removed removed test from calls

* Update ggml/src/ggml-vulkan/vulkan-shaders/pad.comp

Co-authored-by: Ruben Ortlam <picard12@live.de>

* Fixed alignment

* Formatting

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

* Format pad

* Format

* Clang format

* format

* format

* don't change so much stuff

* clang format and update to bool

* fix duplicates

* don't need to fix the padding

* make circular bool

* duplicate again

* rename vulkan to wrap around

* Don't need indent

* moved to const expr

* removed unneded extra line break

* More readable method calls

* Minor wording changes

* Added final newline

* Update ggml/include/ggml.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml/include/ggml.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Added circular pad ext tests

* Gate non circular pad devices

* Cleaned gating of non-circular pad devices

---------

Co-authored-by: Phylliida <phylliidadev@gmail.com>
Co-authored-by: Ruben Ortlam <picard12@live.de>
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-06 15:07:02 +01:00
shalinib-ibm 3a0d10533a
Q4/Q8 Tiled Gemm Optimization. (#16999) 2025-12-05 19:41:51 +08:00
Alberto Cabrera Pérez 87a2084c45
ggml-cpu : remove asserts always evaluating to false (#17728) 2025-12-04 13:16:38 +01:00
Herman Semenoff dea9ba27cb
ggml-cpu: remove duplicate conditional check 'iid' (#17650) 2025-12-04 05:03:19 +08:00
Reese Levine 7ca5991d2b
ggml webgpu: add support for emscripten builds (#17184)
* Faster tensors (#8)

Add fast matrix and matrix/vector multiplication.

* Use map for shader replacements instead of pair of strings

* Wasm (#9)

* webgpu : fix build on emscripten

* more debugging stuff

* test-backend-ops: force single thread on wasm

* fix single-thread case for init_tensor_uniform

* use jspi

* add pthread

* test: remember to set n_thread for cpu backend

* Add buffer label and enable dawn-specific toggles to turn off some checks

* Intermediate state

* Fast working f16/f32 vec4

* Working float fast mul mat

* Clean up naming of mul_mat to match logical model, start work on q mul_mat

* Setup for subgroup matrix mat mul

* Basic working subgroup matrix

* Working subgroup matrix tiling

* Handle weirder sg matrix sizes (but still % sg matrix size)

* Working start to gemv

* working f16 accumulation with shared memory staging

* Print out available subgroup matrix configurations

* Vectorize dst stores for sg matrix shader

* Gemv working scalar

* Minor set_rows optimization (#4)

* updated optimization, fixed errors

* non vectorized version now dispatches one thread per element

* Simplify

* Change logic for set_rows pipelines

---------

Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan>
Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>

* Comment on dawn toggles

* Working subgroup matrix code for (semi)generic sizes

* Remove some comments

* Cleanup code

* Update dawn version and move to portable subgroup size

* Try to fix new dawn release

* Update subgroup size comment

* Only check for subgroup matrix configs if they are supported

* Add toggles for subgroup matrix/f16 support on nvidia+vulkan

* Make row/col naming consistent

* Refactor shared memory loading

* Move sg matrix stores to correct file

* Working q4_0

* Formatting

* Work with emscripten builds

* Fix test-backend-ops emscripten for f16/quantized types

* Use emscripten memory64 to support get_memory

* Add build flags and try ci

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>

* Remove extra whitespace

* Move wasm single-thread logic out of test-backend-ops for cpu backend

* Disable multiple threads for emscripten single-thread builds in ggml_graph_plan

* Fix .gitignore

* Add memory64 option and remove unneeded macros for setting threads to 1

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-12-03 10:25:34 +01:00
Adrien Gallouët e148380c7c
ggml : use svcntb() for SVE vector length detection (#17474)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-12-02 18:21:11 +02:00
Adrien Gallouët ab6726eeff
ggml : add fallback definition for HWCAP2_SVE2 (#17683)
This align with other HWCAP2 feature flags

See #17528

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-12-02 10:41:26 +02:00
Tarek Dakhran 2ba719519d
model: LFM2-VL fixes (#17577)
* Adjust to pytorch

* Add antialiasing upscale

* Increase number of patches to 1024

* Handle default marker insertion for LFM2

* Switch to flag

* Reformat

* Cuda implementation of antialias kernel

* Change placement in ops.cpp

* consistent float literals

* Pad only for LFM2

* Address PR feedback

* Rollback default marker placement changes

* Fallback to CPU implementation for antialias implementation of upscale
2025-11-30 21:57:31 +01:00
ixgbe f698a79c63
ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567)
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-29 14:56:31 +02:00
Piotr Wilkin (ilintar) ff55414c42
model : Qwen3 Next (#16095)
* Qwen3 Next - cleaned up version

* Whitespaces and stuff

* Correct minor errors

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Misc. fixes.

* Clean up code, add missing hybrid qualifier

* Did someone transpose the SOLVE_TRI result matrix? Perhaps...

* Whitespace

* Proper tensors for cb calls

* Use llama-graph.h vertical alignment

* BROKEN: chunking

* Set new tensors as inputs.

* Proper chunk logic

* It's the circle of life...

* More shenanigans for n_seq > 1

* Nail in the coffin?

* Fix Windows build

* Eh, one fails on Windows, the other fails on Mac... just use general capture.

* quant : cleanup

* model : cleanup

* qwen3 : cleanup

* cont : cleanup

* cont : cleanup

* ggml : revert change

* qwen3 : cleanup

* cont : cleanup

* Readd cmath

* qwen3 : fix typo

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Usual suspects

* fix my bad suggestion

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-28 12:02:56 +01:00
Alberto Cabrera Pérez cd8370b408
ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod only) (#17494)
* Enabled q4_K_4x8 path

* Fixed generic Q4_K 8x4 implementation

* wip: dotprod gemm

* Working arm q4_K dotprod gemm

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Undo acc rename

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Q4_K arm dotprod gemm

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Fix: q4_qs reinterpret from uint to int

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Removed comments

* Fixed macro guards

* Fixed unused vars in generic implementation

* Fixed unused vars in 8x4 repack

* Fixed unused vars in generic implementation, unneeded comment

* Missing arch fallback for x86

* minor : style

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-27 13:25:14 +02:00
Alberto Cabrera Pérez 5449367b21
Fix chunks being too small with small matrix sizes (#17526) 2025-11-26 13:14:54 -08:00
xctan 6ab4e50d9c
ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448)
* ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16

* ggml-cpu : dedup scalar impl

* Update ggml/src/ggml-cpu/vec.h

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-26 15:33:05 +02:00
Adrien Gallouët e6923caaec
ggml : fix ARM feature verification (#17519)
On arm64 with `cmake` version 3.31.6, the final feature verification fails:

    -- ARM detected flags: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs
    -- Performing Test GGML_MACHINE_SUPPORTS_dotprod
    -- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_i8mm
    -- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_sve
    -- Performing Test GGML_MACHINE_SUPPORTS_sve - Success
    -- Performing Test GGML_MACHINE_SUPPORTS_sme
    -- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed
    -- Performing Test GGML_MACHINE_SUPPORTS_nosme
    -- Performing Test GGML_MACHINE_SUPPORTS_nosme - Success
    -- Checking for ARM features using flags:
    --   -U__ARM_FEATURE_SME
    --   -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme
    -- Performing Test HAVE_DOTPROD
    -- Performing Test HAVE_DOTPROD - Failed
    -- Performing Test HAVE_SVE
    -- Performing Test HAVE_SVE - Failed
    -- Performing Test HAVE_MATMUL_INT8
    -- Performing Test HAVE_MATMUL_INT8 - Failed
    -- Performing Test HAVE_FMA
    -- Performing Test HAVE_FMA - Success
    -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC
    -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC - Failed
    -- Performing Test HAVE_SME
    -- Performing Test HAVE_SME - Failed
    -- Adding CPU backend variant ggml-cpu: -U__ARM_FEATURE_SME;-mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme

We need to explicitly replace `;` with spaces from the list to make
`CMAKE_REQUIRED_FLAGS` work correctly...

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-26 15:14:41 +02:00
Georgi Gerganov 583cb83416
ggml : add ggml_top_k (#17365)
* ggml : add ggml_top_k

* cont : add ggml_argsort_top_k

* metal : add top_k support

* ggml : cleanup

* tests : add virtual err() function for test_case

* ggml : add comments
2025-11-25 15:31:43 +02:00
Alberto Cabrera Pérez dbb852b549
ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (#16739)
* Enabled q4_K_8x8_q8_K path on ARM

* wip: I8mm qs multiplication, pending bias

* cpu : arm : REPACK gemm q4_K8x8 implementation

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Guard gemm with proper features, improved superblock scale and min calc

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* cpu: arm: Implemented REPACK gemv for Q4_K

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Removed completed TODO

* Fixed missing guards when selecting optimal repack type for Q4_K

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Fixed macro guard for gemv

* Fixed wrong comment in GEMV

* Fixed warning for unused variable

* vdotq_s32 -> ggml_vdotq_s32

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Clang-format issues

* Apply suggestions from code review

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Removed unnecessary GGML_UNUSED

* Fixed guards in q4_k gemm and gemv (repack)

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-11-24 13:08:11 +02:00
ixgbe 5f55c385cb
ggml: add RISC-V cpu-feats (#17461)
* ggml: add RISC-V cpu-feats

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>

* fix comment[1]

---------

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-24 13:07:14 +02:00
Piotr Wilkin (ilintar) 845f200b28
ggml : Fix transposed SOLVE_TRI result (#17323)
* Did someone transpose the SOLVE_TRI result matrix? Perhaps...

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-20 12:58:21 +02:00
Adrien Gallouët 79bb743512
ggml : remove useless and error-prone variadic macros (#17399)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-20 11:18:27 +01:00
sudhiarm 3ae282a06f
kleidiai: fix zero-size array declaration (#17240) 2025-11-20 11:45:49 +02:00
ixgbe 5be353ec4a
ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling (#17314)
* ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>

* fix comment

* fix comment 2

---------

Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-20 08:09:18 +02:00
Jeremy Rand c49daff5ba
ggml-cpu: Don't pass -mpowerpc64 when -mcpu already implies it (#17308) 2025-11-19 14:19:00 +08:00
Adrien Gallouët cb44fc84e8
cmake : fix ARM feature verification (#17170)
* cmake : fix ARM feature verification

Use check_cxx_source_compiles to prevent conflicts with
the existing GGML_NATIVE detection code.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* cmake : unset __ARM_FEATURE when feature is disabled

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* cmake : fix scope, this is really a macro

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* arm_neon.h is useless

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-17 21:37:29 +01:00
Adrien Gallouët cb623de3fc
ggml : add missing AVX512 feature checks (#17270)
_mm512_cvtepu8_epi16        requires  __AVX512BW__
_mm512_srli_epi16           requires  __AVX512BW__
__builtin_ia32_inserti32x8  requires  __AVX512DQ__

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-17 12:12:00 +01:00
Alberto Cabrera Pérez becc4816dd
ggml-cpu: handle 3d tensors in repack mat_mul (#17241)
* ggml-cpu: handle 3d tensors in repack mul_mat

* Removed unnecessary branch, removed need for <algorithm>

* Fixed dst_ptr pointer in chunk + clang_format

* GGML_ASSERT to check wdata within bounds

* Accidental ggml.h inclusion

* Improved GGML_ASSERT on wdata boundaries

* Address performance regression in Qwen and llama.cpp due to chunking
2025-11-13 12:53:00 -08:00
Piotr Wilkin (ilintar) 389ac78b26
ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (#17063)
* Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM

* Update ggml/include/ggml.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update tests/test-backend-ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Code review

* Whitespace

* Update tests/test-backend-ops.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* This is actually sigmoid, duh.

* Add CONST, remove TRI_KEEP, other changes from review

* Update tests/test-backend-ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml/src/ggml.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml/src/ggml-cuda/unary.cu

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

* Remove extra script

* Update ggml/src/ggml.c

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Update tests/test-backend-ops.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* moving changes from laptop [no ci]

* pre-rebase

* Update tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Refactor tests

* ggml : cleanup

* cont : fix ggml_fill srcs

* tests : add note

* ggml : add ggml_fill_inplace

* ggml : add asserts

* ggml : fix ggml_fill constant cast

* cont : ggml_tri minor

* Use TENSOR_LOCALS

* Fix regression from #14596, regenerate

* Don't make commits at night...

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: Aman Gupta <amangupta052@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-13 20:54:47 +02:00
ixgbe 1215dde7b0
ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations (#17227)
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-13 13:13:32 +01:00
Georgi Gerganov 2776db6c81
Revert "ggml-cpu: handle 3d tensors in repack mat_mul (#17030)" (#17233)
This reverts commit 1c398dc9ec.
2025-11-13 12:59:37 +02:00
Diego Devesa 879dec341a
ggml-cpu : use template for argsort (#17222) 2025-11-13 10:59:05 +02:00
Georgi Gerganov 374fe09cdd
ggml : use std::sort in ggml_argsort CPU implementation (#17211)
* ggml : use std::sort in ggml_argsort CPU implementation

* cont : add missing header
2025-11-12 20:43:38 +02:00
Alberto Cabrera Pérez 1c398dc9ec
ggml-cpu: handle 3d tensors in repack mat_mul (#17030)
* ggml-cpu: handle 3d tensors in repack mul_mat

* Removed unnecessary branch, removed need for <algorithm>

* Fixed dst_ptr pointer in chunk + clang_format

* GGML_ASSERT to check wdata within bounds

* Accidental ggml.h inclusion

* Improved GGML_ASSERT on wdata boundaries
2025-11-12 14:52:19 +02:00
ixgbe ca4844062b
ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conversion (#17161)
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
2025-11-11 13:41:51 +02:00
duduta 73460f6278
ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805)
* extract rotate_pairs logic from ggml_compute_forward_rope_f32

* templateify ggml_compute_forward_rope_f32 and _f16

* abort when rope type not supported, remove GLM from test-rope

* add imrope branch to switch

* add rope tests for perf

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-11 13:33:24 +02:00
Charles Xu 8c583242ad
kleidiai: add optimized per-channel kernels for Q8_0 (#16993) 2025-11-11 13:20:31 +02:00
Max Krasnyansky 395e286bc9
cpu: skip NOPs to avoid barriers (#17133)
* cpu: skip NOPs to avoid barriers

* cpu: use ggml_op_is_empty
2025-11-10 12:44:49 -08:00
Adrien Gallouët 967eb4b2bf
ggml-cpu : inspect -march and -mcpu to found the CPU (#16333)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-11-10 21:03:36 +02:00
fj-y-saito df70bedda7
arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_… (#15277)
* add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_q8_K

* Surround SVE function with compiler directive

* fix compile switch

* fix coding style

* ggml : fix indent

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-10 15:12:59 +02:00
Adrien Gallouët 9eb9a1331d
Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)" (#17084)
This reverts commit 7c23f3f0d4.
2025-11-07 18:34:05 +02:00
iron 7c23f3f0d4
ggml-cpu: detect correct cpu flags for arm64 (#16229) (#16239)
When using GCC 9 and GCC 12 on the arm64 platform of ubuntu 2004,
the command "gcc -mcpu=native -E -v -" fails to detect the correct CPU flags,
which results in compilation failures for certain extended instructions,
but the correct CPU flags can be obtained by using gcc -march.

Signed-off-by: lizhenneng <lizhenneng@kylinos.cn>
Co-authored-by: lizhenneng <lizhenneng@kylinos.cn>
2025-11-07 08:18:14 -08:00
xctan 7f09a680af
ggml-cpu : optimize RVV q2_k and q3_k kernels (#16887) 2025-11-06 18:12:45 +02:00