llama.cpp

Commit Graph

Author	SHA1	Message	Date
Zhou	48f3bdd09b	Merge `07e9a39c6e` into `3bc8d2cf23`	2026-02-02 00:18:18 +02:00
Alberto Cabrera Pérez	6ad70c5a77	ggml-cpu: arm64: Q4_K scale unroll and vectorization (#19108 )	2026-01-28 09:15:56 +02:00
Alberto Cabrera Pérez	be8890e721	ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888 ) * Boilerplate for q6_K repack * q6_K repack to q6_Kx8 implementation Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * q6_K generic gemv and gemm * wip, gemm_q6_K 8x8 * Still WIP: loading of q8s, q6h and q6l * first working version of q6_K gemm * Moved q6 loads outside of sb block, Unrolled inner loop * Replaced modulo with mask * First implementation of GEMV * ggml_vdotq_s32 -> vdotq_s32 * Reduce width of accumulators in q6_K gemv * Bsums instead of calc bias. Preload scales to use vget_lane. Unroll. * Reuse scales in GEMM (same GEMV opt) * Added todos for bsum and different qh repack * Arch fallback * VSLIQ for merging qh adn ql * Removed TODO, already tested * Apply suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Removed unused import --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-01-27 11:08:10 +02:00
shalinib-ibm	7afdfc9b84	ggml-cpu: Enable FP16 MMA kernels on PPC (#19060 )	2026-01-27 11:52:34 +08:00
Aman Gupta	bcb43163ae	ggml-cpu: Use tiled FA for prompt-processing (#19012 ) * ggml-cpu: Use tiled FA for prompt-processing the FA performance is gimped on CPU on long contexts because it essentially uses a vector kernel. This PR adds a tiled FA for PP. Perf tuning for tile sizes done on a AMD EPYC single-socket 64-c machine. * fix out of bounds for mask * skip rows where there are all masks * skip tile if mask is inf * store mask in worksize * check inf tile earlier	2026-01-25 23:25:58 +08:00
Alberto Cabrera Pérez	091a46cb8d	ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (#18860 ) * Boilerplate for q5_Kx8 REPACK on ARM and fallback Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Implements make_block_q5_Kx8 by extending make_block_q4_Kx8 Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * q5_K repack gemm and gemv generics * Gemm and Gemv ARM implementations (i8mm) * Improved qh manipulation looking at non-repack vec_dot implementation * Full unroll * Apply Q5_K Gemv vand and vshl optimizations to gemm. Improve comments. Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Fix wrong fallback definitions of Q5_K Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Fixed comments. Reverted unnecessary formatting Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Fixed typo in generic definitions * Switching AND + Shift with Shift Insert. Better op interleaving. * Vectorize + unroll the block scales * Apply gemm optimizations to gemv * Improve bias calculation --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>	2026-01-23 09:55:08 +02:00
Georgi Gerganov	365a3e8c31	ggml : add ggml_build_forward_select (#18550 ) * ggml : add ggml_build_forward_select * cuda : adapt CUDA graph compat to new feature * vulkan : update logic to handle command buffer closing * ggml : check compute for fusion * ggml : add comment	2026-01-19 20:03:19 +02:00
Thore Koritzius	388ce82241	ggml : extend ggml_pool_1d + metal (#16429 ) * chore: resolve conflicts * feat: ggml metal impl * fix: ggml_metal_kargs_pool_1d struct * fix: require contiguous input * chore: test pool_1d * chore: limit pool1d test cases to p0=0 and s0=k0 to conform with asserts * chore: add p0 and s0 to testing * fix: allow padding for cpu and metal * Update ggml/src/ggml-metal/ggml-metal.metal * fix: correct single-threaded loop * ggml : cleanup * tests : add ne[1] != 1 tests * fix: ne[1] handling in np * cont : fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-01-16 16:59:56 +02:00
shalinib-ibm	8cc0ba957b	ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (#18837 )	2026-01-15 17:31:18 +08:00
Zhou	07e9a39c6e	Merge branch 'master' into repack-numa-opt	2026-01-13 16:28:17 +08:00
Jianhui Zhou	5714d4b86e	ggml: Add thread count control during repacking This change enables the repack stage to utilize the user-specified thread count, ensuring that both the logical thread IDs and the total number of threads remain consistent between the repack and inference stages. In a NUMA architecture where the `--numa distribute` parameter is used, logical threads are pinned to specific physical NUMA nodes. By aligning the thread configuration across these two stages, we can fully leverage the operating system's "first-touch" memory allocation policy: 1. Repack Stage: Logical thread i (bound to NUMA node j) is responsible for repacking and writing the weight data. Since the "first touch" occurs within this thread, the corresponding physical memory is allocated on node j. 2. Inference Stage: The same logical thread i (still bound to node j) reads these weights. Since the data already resides on the local node, low-latency local memory access is achieved. Without ensuring consistency in the number of threads, data may be randomly allocated to mismatched nodes, resulting in significant cross-node access overhead during inference. Signed-off-by: Jianhui Zhou <jonaszhou@zhaoxin.com>	2026-01-13 07:36:31 +00:00
Jianhui Zhou	11b753e786	ggml: optimize repack on NUMA by binding threads When using repack buffer type, the physical memory allocation is dictated by the first-touch policy. Since the main thread performs the write operations, memory is often allocated on a single NUMA node, leading to uneven weight distribution. Multi-threaded repack can alleviate this problem, but the threads are not bound to NUMA nodes. This patch applies the same thread affinity strategy (--numa distribute) to the repacking phase. By binding the repack threads to the same nodes as the compute threads, we ensure that weights are written (and thus allocated) on the local NUMA node, minimizing cross-node memory access during inference. Performance on Intel Xeon Silver 4514Y (32 core): qwen3 8B Q4_K: 19.39 -> 26.92 t/s (+39%) qwen3 32B Q4_K: 4.99 -> 7.38 t/s (+48%) Signed-off-by: Jianhui Zhou <jonaszhou@zhaoxin.com>	2026-01-08 14:12:59 +00:00
pestopoppa	b1366757cf	ggml-cpu: parallelize tensor repacking with OpenMP Add OpenMP parallelization to tensor repack functions to significantly speed up model loading on many-core CPUs. Measured on AMD EPYC 9655 (96 cores): \| Model Size \| Before \| After \| Speedup \| \|------------\|--------\|-------\|---------\| \| 6.8GB Q4_K \| 5.0s \| 3.3s \| 1.5x \| \| 19GB Q4_K \| 11.9s \| 5.3s \| 2.2x \| \| 271GB Q4_K \| ~150s \| ~60s \| ~2.5x \| The repack functions convert quantized tensors from storage layout to SIMD-optimized layout for AVX-512. This was previously single-threaded and is now parallelized across row groups. Key changes: - Convert pointer-increment loops to explicit indexing - Add #pragma omp parallel for to outer loops (guarded by #ifdef _OPENMP) - Each thread processes independent row groups - Move thread-local dst_tmp arrays inside parallel region Functions parallelized: - repack_q4_0_to_q4_0_4_bl (Q4_0 x4 interleave) - repack_q4_K_to_q4_K_8_bl (Q4_K_M, Q4_K_S models) - repack_q2_K_to_q2_K_8_bl (Q2_K models) - repack_q4_0_to_q4_0_8_bl (Q4_0 x8 interleave) - repack_iq4_nl_to_iq4_nl_4_bl (IQ4_NL x4) - repack_iq4_nl_to_iq4_nl_8_bl (IQ4_NL x8) Tested on: AMD EPYC 9655 "Turin" with 192 threads	2026-01-01 12:51:30 +01:00
Charles Xu	2d6c00a9b8	kleidiai: add and integrate SVE 256-bit vector-length kernel (#18458 ) * kleidiai: add and integrate SVE 256-bit vector-length kernel * updated for review comments	2025-12-30 14:04:53 +02:00
Boian Berberov	94de74e7b1	cmake: Added more x86_64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` (#18186 ) * minor: Consolidated `#include <immintrin.h>` under `ggml-cpu-impl.h` * cmake: Added more x86-64 CPU backends when building with `GGML_CPU_ALL_VARIANTS=On` - `ivybridge` - `piledriver` - `cannonlake` - `cascadelake` - `cooperlake` - `zen4` Resolves: #17966	2025-12-28 09:33:29 +02:00
Taimur Ahmad	d34d5ca1e9	llamafile: add rvv support for sgemm kernels (#18199 ) Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>	2025-12-22 20:20:23 +02:00
Taimur Ahmad	f716588e63	ggml-cpu: extend support for RVV floating-point kernels (#17318 ) * cmake: add BF16 RVV flag for ggml-cpu * ggml-cpu: add floating-point conversion kernels * ggml: add floating-point kernels Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: fix lmul in vec_dot_bf16 * ggml-cpu: change redsum to lmul 4, fix leftover --------- Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>	2025-12-18 16:02:09 +02:00
Alberto Cabrera Pérez	669696e00d	ggml-cpu: ARM64: repack version of q8_0 (dotprod and i8mm) (#18096 ) * wip: skeleton for q8_0 repack * q8_0 repack GEMV implementations * GEMM implementations * Formatting * Fixed format consistency of repack gemm and gemv declarations * gemv and gemm generic location consistent with declarations * Removed non-correct unused variables statements * Cleanup, consistent style * Missing generic fallbacks for x86 and powerpc	2025-12-17 13:39:13 +02:00
Georgi Gerganov	a63cbafbbc	ggml : arm repack fix build	2025-12-14 08:33:51 +02:00
Georgi Gerganov	71fdcf0616	ggml : arm repack fix build (whisper/0)	2025-12-14 08:33:51 +02:00
ixgbe	51604435e8	ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (#17951 ) * ggml-cpu:fix RISC-V Q4_0 repack select and RVV feature reporting Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> * using the name VLEN instead of CNT * Update ggml/include/ggml-cpu.h --------- Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-12 16:26:03 +02:00
Max Krasnyansky	e1f4921980	Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748 ) * tests: update barrier test to check for race condition in active threads * cpu: combine n_graph and n_threads into a single atomic update * tests: add multi-graph test for test_barrier	2025-12-10 12:32:23 -08:00
ixgbe	79d61896d3	ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (#17784 ) * ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> * cmake: enable RISC-V zihintpause extension for Spacemit builds * readme : add ZIHINTPAUSE support for RISC-V --------- Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-12-08 10:41:34 +02:00
Phylliida Dev	09c7c50e64	ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985 ) * Feat: Added vulkan circular tiling support * Feat: Added cpu circular * Feat: Added cuda kernels * Added tests * Added tests * Removed non-pad operations * Removed unneded changes * removed backend non pad tests * Update test-backend-ops.cpp * Fixed comment on pad test * removed trailing whitespace * Removed unneded test in test-backend-ops * Removed removed test from calls * Update ggml/src/ggml-vulkan/vulkan-shaders/pad.comp Co-authored-by: Ruben Ortlam <picard12@live.de> * Fixed alignment * Formatting Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Format pad * Format * Clang format * format * format * don't change so much stuff * clang format and update to bool * fix duplicates * don't need to fix the padding * make circular bool * duplicate again * rename vulkan to wrap around * Don't need indent * moved to const expr * removed unneded extra line break * More readable method calls * Minor wording changes * Added final newline * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Added circular pad ext tests * Gate non circular pad devices * Cleaned gating of non-circular pad devices --------- Co-authored-by: Phylliida <phylliidadev@gmail.com> Co-authored-by: Ruben Ortlam <picard12@live.de> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-06 15:07:02 +01:00
shalinib-ibm	3a0d10533a	Q4/Q8 Tiled Gemm Optimization. (#16999 )	2025-12-05 19:41:51 +08:00
Alberto Cabrera Pérez	87a2084c45	ggml-cpu : remove asserts always evaluating to false (#17728 )	2025-12-04 13:16:38 +01:00
Herman Semenoff	dea9ba27cb	ggml-cpu: remove duplicate conditional check 'iid' (#17650 )	2025-12-04 05:03:19 +08:00
Reese Levine	7ca5991d2b	ggml webgpu: add support for emscripten builds (#17184 ) * Faster tensors (#8) Add fast matrix and matrix/vector multiplication. * Use map for shader replacements instead of pair of strings * Wasm (#9) * webgpu : fix build on emscripten * more debugging stuff * test-backend-ops: force single thread on wasm * fix single-thread case for init_tensor_uniform * use jspi * add pthread * test: remember to set n_thread for cpu backend * Add buffer label and enable dawn-specific toggles to turn off some checks * Intermediate state * Fast working f16/f32 vec4 * Working float fast mul mat * Clean up naming of mul_mat to match logical model, start work on q mul_mat * Setup for subgroup matrix mat mul * Basic working subgroup matrix * Working subgroup matrix tiling * Handle weirder sg matrix sizes (but still % sg matrix size) * Working start to gemv * working f16 accumulation with shared memory staging * Print out available subgroup matrix configurations * Vectorize dst stores for sg matrix shader * Gemv working scalar * Minor set_rows optimization (#4) * updated optimization, fixed errors * non vectorized version now dispatches one thread per element * Simplify * Change logic for set_rows pipelines --------- Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Comment on dawn toggles * Working subgroup matrix code for (semi)generic sizes * Remove some comments * Cleanup code * Update dawn version and move to portable subgroup size * Try to fix new dawn release * Update subgroup size comment * Only check for subgroup matrix configs if they are supported * Add toggles for subgroup matrix/f16 support on nvidia+vulkan * Make row/col naming consistent * Refactor shared memory loading * Move sg matrix stores to correct file * Working q4_0 * Formatting * Work with emscripten builds * Fix test-backend-ops emscripten for f16/quantized types * Use emscripten memory64 to support get_memory * Add build flags and try ci --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> * Remove extra whitespace * Move wasm single-thread logic out of test-backend-ops for cpu backend * Disable multiple threads for emscripten single-thread builds in ggml_graph_plan * Fix .gitignore * Add memory64 option and remove unneeded macros for setting threads to 1 --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-12-03 10:25:34 +01:00
Adrien Gallouët	e148380c7c	ggml : use svcntb() for SVE vector length detection (#17474 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-02 18:21:11 +02:00
Adrien Gallouët	ab6726eeff	ggml : add fallback definition for HWCAP2_SVE2 (#17683 ) This align with other HWCAP2 feature flags See #17528 Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-02 10:41:26 +02:00
Tarek Dakhran	2ba719519d	model: LFM2-VL fixes (#17577 ) * Adjust to pytorch * Add antialiasing upscale * Increase number of patches to 1024 * Handle default marker insertion for LFM2 * Switch to flag * Reformat * Cuda implementation of antialias kernel * Change placement in ops.cpp * consistent float literals * Pad only for LFM2 * Address PR feedback * Rollback default marker placement changes * Fallback to CPU implementation for antialias implementation of upscale	2025-11-30 21:57:31 +01:00
ixgbe	f698a79c63	ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-29 14:56:31 +02:00
Piotr Wilkin (ilintar)	ff55414c42	model : Qwen3 Next (#16095 ) * Qwen3 Next - cleaned up version * Whitespaces and stuff * Correct minor errors * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Misc. fixes. * Clean up code, add missing hybrid qualifier * Did someone transpose the SOLVE_TRI result matrix? Perhaps... * Whitespace * Proper tensors for cb calls * Use llama-graph.h vertical alignment * BROKEN: chunking * Set new tensors as inputs. * Proper chunk logic * It's the circle of life... * More shenanigans for n_seq > 1 * Nail in the coffin? * Fix Windows build * Eh, one fails on Windows, the other fails on Mac... just use general capture. * quant : cleanup * model : cleanup * qwen3 : cleanup * cont : cleanup * cont : cleanup * ggml : revert change * qwen3 : cleanup * cont : cleanup * Readd cmath * qwen3 : fix typo * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Usual suspects * fix my bad suggestion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-28 12:02:56 +01:00
Alberto Cabrera Pérez	cd8370b408	ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod only) (#17494 ) * Enabled q4_K_4x8 path * Fixed generic Q4_K 8x4 implementation * wip: dotprod gemm * Working arm q4_K dotprod gemm Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Undo acc rename Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Q4_K arm dotprod gemm Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Fix: q4_qs reinterpret from uint to int Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Removed comments * Fixed macro guards * Fixed unused vars in generic implementation * Fixed unused vars in 8x4 repack * Fixed unused vars in generic implementation, unneeded comment * Missing arch fallback for x86 * minor : style --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-27 13:25:14 +02:00
Alberto Cabrera Pérez	5449367b21	Fix chunks being too small with small matrix sizes (#17526 )	2025-11-26 13:14:54 -08:00
xctan	6ab4e50d9c	ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (#17448 ) * ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 * ggml-cpu : dedup scalar impl * Update ggml/src/ggml-cpu/vec.h --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-26 15:33:05 +02:00
Adrien Gallouët	e6923caaec	ggml : fix ARM feature verification (#17519 ) On arm64 with `cmake` version 3.31.6, the final feature verification fails: -- ARM detected flags: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs -- Performing Test GGML_MACHINE_SUPPORTS_dotprod -- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Success -- Performing Test GGML_MACHINE_SUPPORTS_i8mm -- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Success -- Performing Test GGML_MACHINE_SUPPORTS_sve -- Performing Test GGML_MACHINE_SUPPORTS_sve - Success -- Performing Test GGML_MACHINE_SUPPORTS_sme -- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed -- Performing Test GGML_MACHINE_SUPPORTS_nosme -- Performing Test GGML_MACHINE_SUPPORTS_nosme - Success -- Checking for ARM features using flags: -- -U__ARM_FEATURE_SME -- -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme -- Performing Test HAVE_DOTPROD -- Performing Test HAVE_DOTPROD - Failed -- Performing Test HAVE_SVE -- Performing Test HAVE_SVE - Failed -- Performing Test HAVE_MATMUL_INT8 -- Performing Test HAVE_MATMUL_INT8 - Failed -- Performing Test HAVE_FMA -- Performing Test HAVE_FMA - Success -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC - Failed -- Performing Test HAVE_SME -- Performing Test HAVE_SME - Failed -- Adding CPU backend variant ggml-cpu: -U__ARM_FEATURE_SME;-mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme We need to explicitly replace `;` with spaces from the list to make `CMAKE_REQUIRED_FLAGS` work correctly... Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-26 15:14:41 +02:00
Georgi Gerganov	583cb83416	ggml : add ggml_top_k (#17365 ) * ggml : add ggml_top_k * cont : add ggml_argsort_top_k * metal : add top_k support * ggml : cleanup * tests : add virtual err() function for test_case * ggml : add comments	2025-11-25 15:31:43 +02:00
Alberto Cabrera Pérez	dbb852b549	ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (#16739 ) * Enabled q4_K_8x8_q8_K path on ARM * wip: I8mm qs multiplication, pending bias * cpu : arm : REPACK gemm q4_K8x8 implementation Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Guard gemm with proper features, improved superblock scale and min calc Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * cpu: arm: Implemented REPACK gemv for Q4_K Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Removed completed TODO * Fixed missing guards when selecting optimal repack type for Q4_K Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Fixed macro guard for gemv * Fixed wrong comment in GEMV * Fixed warning for unused variable * vdotq_s32 -> ggml_vdotq_s32 Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * Clang-format issues * Apply suggestions from code review Co-authored-by: Diego Devesa <slarengh@gmail.com> * Removed unnecessary GGML_UNUSED * Fixed guards in q4_k gemm and gemv (repack) --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-11-24 13:08:11 +02:00
ixgbe	5f55c385cb	ggml: add RISC-V cpu-feats (#17461 ) * ggml: add RISC-V cpu-feats Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> * fix comment[1] --------- Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-24 13:07:14 +02:00
Piotr Wilkin (ilintar)	845f200b28	ggml : Fix transposed SOLVE_TRI result (#17323 ) * Did someone transpose the SOLVE_TRI result matrix? Perhaps... * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-20 12:58:21 +02:00
Adrien Gallouët	79bb743512	ggml : remove useless and error-prone variadic macros (#17399 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-20 11:18:27 +01:00
sudhiarm	3ae282a06f	kleidiai: fix zero-size array declaration (#17240 )	2025-11-20 11:45:49 +02:00
ixgbe	5be353ec4a	ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling (#17314 ) * ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> * fix comment * fix comment 2 --------- Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-20 08:09:18 +02:00
Jeremy Rand	c49daff5ba	ggml-cpu: Don't pass -mpowerpc64 when -mcpu already implies it (#17308 )	2025-11-19 14:19:00 +08:00
Adrien Gallouët	cb44fc84e8	cmake : fix ARM feature verification (#17170 ) * cmake : fix ARM feature verification Use check_cxx_source_compiles to prevent conflicts with the existing GGML_NATIVE detection code. Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : unset __ARM_FEATURE when feature is disabled Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : fix scope, this is really a macro Signed-off-by: Adrien Gallouët <angt@huggingface.co> * arm_neon.h is useless Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-17 21:37:29 +01:00
Adrien Gallouët	cb623de3fc	ggml : add missing AVX512 feature checks (#17270 ) _mm512_cvtepu8_epi16 requires __AVX512BW__ _mm512_srli_epi16 requires __AVX512BW__ __builtin_ia32_inserti32x8 requires __AVX512DQ__ Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-17 12:12:00 +01:00
Alberto Cabrera Pérez	becc4816dd	ggml-cpu: handle 3d tensors in repack mat_mul (#17241 ) * ggml-cpu: handle 3d tensors in repack mul_mat * Removed unnecessary branch, removed need for <algorithm> * Fixed dst_ptr pointer in chunk + clang_format * GGML_ASSERT to check wdata within bounds * Accidental ggml.h inclusion * Improved GGML_ASSERT on wdata boundaries * Address performance regression in Qwen and llama.cpp due to chunking	2025-11-13 12:53:00 -08:00
Piotr Wilkin (ilintar)	389ac78b26	ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (#17063 ) * Add ops needed for new hybrid models: SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Code review * Whitespace * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * This is actually sigmoid, duh. * Add CONST, remove TRI_KEEP, other changes from review * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Remove extra script * Update ggml/src/ggml.c Co-authored-by: Diego Devesa <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: Diego Devesa <slarengh@gmail.com> * moving changes from laptop [no ci] * pre-rebase * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tests/test-backend-ops.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Refactor tests * ggml : cleanup * cont : fix ggml_fill srcs * tests : add note * ggml : add ggml_fill_inplace * ggml : add asserts * ggml : fix ggml_fill constant cast * cont : ggml_tri minor * Use TENSOR_LOCALS * Fix regression from #14596, regenerate * Don't make commits at night... --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-13 20:54:47 +02:00
ixgbe	1215dde7b0	ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations (#17227 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-13 13:13:32 +01:00

1 2 3 4 5 ...

292 Commits