llama.cpp

Commit Graph

Author	SHA1	Message	Date
Sundaram krishnan	a0bbcdd9b6	ggml: guard KleidiAI DOWNLOAD_EXTRACT_TIMESTAMP for cmake < 3.24 (#20767 )	2026-03-19 21:36:23 +02:00
Charles Xu	3fee84e156	cmake : fix build warning when kleidiai is enabled (#20457 ) * cmake : fix build warning when kleidiai is enabled * remove LLAMA_ARG_THREADS from KleidiAI backend	2026-03-19 10:14:48 +02:00
Charles Xu	137435ff15	kleidiai : add sme fp16 compute path for q4_0 gemm on aarch64 (#20043 )	2026-03-03 11:40:26 +02:00
Talha Can Havadar	ae2d3f28a8	ggml: ggml-cpu: force-no-lto-for-cpu-feats (#19609 ) When LTO enabled in build environments it forces all builds to have LTO in place. But feature detection logic is fragile, and causing Illegal instruction errors with lto. This disables LTO for the feature detection code to prevent cross-module optimization from inlining architecture-specific instructions into the score function. Without this, LTO can cause SIGILL when loading backends on older CPUs (e.g., loading power10 backend on power9 crashes before feature check runs).	2026-02-17 13:22:46 +02:00
Daniel Bevenius	57088276d4	cmake : check if KleidiAI API has been fetched (#19640 ) This commit addresses a build issue with the KleidiAI backend when building multiple cpu backends. Commmit `3a00c98584` ("cmake : fix KleidiAI install target failure with EXCLUDE_FROM_ALL") introduced a change where FetchContent_Populate is called instead of FetchContent_MakeAvailable, where the latter does handle this case (it is idempotent but FetchContent_Populate is not). I missed this during my review and I should not have commited without verifying the CI failure, sorry about that.	2026-02-15 13:59:38 +01:00
SamareshSingh	3a00c98584	cmake : fix KleidiAI install target failure with EXCLUDE_FROM_ALL (#19581 ) * cmake: fix KleidiAI install target failure with EXCLUDE_FROM_ALL Fix for the bug #19501 by adding EXCLUDE_FROM_ALL to FetchContent_Declare. This properly excludes KleidiAI from both build and install targets, preventing install failures when GGML_CPU_KLEIDIAI=ON is used. The KleidiAI source files are still compiled into libggml-cpu.so, preserving all functionality. * addressed code review comments	2026-02-15 06:22:53 +01:00
Charles Xu	2d6c00a9b8	kleidiai: add and integrate SVE 256-bit vector-length kernel (#18458 ) * kleidiai: add and integrate SVE 256-bit vector-length kernel * updated for review comments	2025-12-30 14:04:53 +02:00
Taimur Ahmad	f716588e63	ggml-cpu: extend support for RVV floating-point kernels (#17318 ) * cmake: add BF16 RVV flag for ggml-cpu * ggml-cpu: add floating-point conversion kernels * ggml: add floating-point kernels Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai> * ggml-cpu: fix lmul in vec_dot_bf16 * ggml-cpu: change redsum to lmul 4, fix leftover --------- Co-authored-by: Rehan Qasim <rehan.qasim@10xengineers.ai>	2025-12-18 16:02:09 +02:00
ixgbe	79d61896d3	ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (#17784 ) * ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> * cmake: enable RISC-V zihintpause extension for Spacemit builds * readme : add ZIHINTPAUSE support for RISC-V --------- Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-12-08 10:41:34 +02:00
Adrien Gallouët	e6923caaec	ggml : fix ARM feature verification (#17519 ) On arm64 with `cmake` version 3.31.6, the final feature verification fails: -- ARM detected flags: -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs -- Performing Test GGML_MACHINE_SUPPORTS_dotprod -- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Success -- Performing Test GGML_MACHINE_SUPPORTS_i8mm -- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Success -- Performing Test GGML_MACHINE_SUPPORTS_sve -- Performing Test GGML_MACHINE_SUPPORTS_sve - Success -- Performing Test GGML_MACHINE_SUPPORTS_sme -- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed -- Performing Test GGML_MACHINE_SUPPORTS_nosme -- Performing Test GGML_MACHINE_SUPPORTS_nosme - Success -- Checking for ARM features using flags: -- -U__ARM_FEATURE_SME -- -mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme -- Performing Test HAVE_DOTPROD -- Performing Test HAVE_DOTPROD - Failed -- Performing Test HAVE_SVE -- Performing Test HAVE_SVE - Failed -- Performing Test HAVE_MATMUL_INT8 -- Performing Test HAVE_MATMUL_INT8 - Failed -- Performing Test HAVE_FMA -- Performing Test HAVE_FMA - Success -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC -- Performing Test HAVE_FP16_VECTOR_ARITHMETIC - Failed -- Performing Test HAVE_SME -- Performing Test HAVE_SME - Failed -- Adding CPU backend variant ggml-cpu: -U__ARM_FEATURE_SME;-mcpu=neoverse-v2+crc+sve2-aes+sve2-sha3+nossbs+dotprod+i8mm+sve+nosme We need to explicitly replace `;` with spaces from the list to make `CMAKE_REQUIRED_FLAGS` work correctly... Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-26 15:14:41 +02:00
ixgbe	5f55c385cb	ggml: add RISC-V cpu-feats (#17461 ) * ggml: add RISC-V cpu-feats Signed-off-by: Wang Yang <yangwang@iscas.ac.cn> * fix comment[1] --------- Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-24 13:07:14 +02:00
Jeremy Rand	c49daff5ba	ggml-cpu: Don't pass -mpowerpc64 when -mcpu already implies it (#17308 )	2025-11-19 14:19:00 +08:00
Adrien Gallouët	cb44fc84e8	cmake : fix ARM feature verification (#17170 ) * cmake : fix ARM feature verification Use check_cxx_source_compiles to prevent conflicts with the existing GGML_NATIVE detection code. Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : unset __ARM_FEATURE when feature is disabled Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : fix scope, this is really a macro Signed-off-by: Adrien Gallouët <angt@huggingface.co> * arm_neon.h is useless Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-17 21:37:29 +01:00
Charles Xu	8c583242ad	kleidiai: add optimized per-channel kernels for Q8_0 (#16993 )	2025-11-11 13:20:31 +02:00
Adrien Gallouët	967eb4b2bf	ggml-cpu : inspect -march and -mcpu to found the CPU (#16333 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-10 21:03:36 +02:00
Adrien Gallouët	9eb9a1331d	Revert "ggml-cpu: detect correct cpu flags for arm64 (#16229 ) (#16239 )" (#17084 ) This reverts commit `7c23f3f0d4`.	2025-11-07 18:34:05 +02:00
iron	7c23f3f0d4	ggml-cpu: detect correct cpu flags for arm64 (#16229 ) (#16239 ) When using GCC 9 and GCC 12 on the arm64 platform of ubuntu 2004, the command "gcc -mcpu=native -E -v -" fails to detect the correct CPU flags, which results in compilation failures for certain extended instructions, but the correct CPU flags can be obtained by using gcc -march. Signed-off-by: lizhenneng <lizhenneng@kylinos.cn> Co-authored-by: lizhenneng <lizhenneng@kylinos.cn>	2025-11-07 08:18:14 -08:00
Aaron Teo	d38d9f0877	ggml: add s390x cpu-feats (#16774 )	2025-11-02 08:48:23 +08:00
Aaron Teo	4f73d0a951	ci : fix binaries release failure for s390x (binaries may not work yet) (#16664 ) * devops: initial patch Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: forgot the z15 suffix Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: attempt at impl GGML_CPU_ALL_VARIANTS for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: rm baseline version Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-10-19 23:06:39 +02:00
Charles Xu	f1eb1cb1eb	kleidiai : fix work size and threads sync for fp16 (#16246 )	2025-09-30 10:07:20 +03:00
alex-spacemit	b77e6c18e1	ggml: riscv: add riscv spacemit backend (#15288 ) * ggml: add spacemit backend Change-Id: I249bdc043485d815a9c351867137bc1e27cc2e23 * add new line at end of file Change-Id: I889ed1c85fb45e62350ecde0c06f70450cadfbe2 * add riscv zba extension limit Change-Id: I321eb200f859751727afe5cae13074dfce2bb0ce * fixed for review comments, file renamed and format Change-Id: Ia20b6ec24a36638e62e0fe07cf100916a7cce3ce * fixed for code format, after clang-format Change-Id: I5dc33a0412da3d3f2d77075d8939185d3009eca2 * use _Float16 instead of __fp16 Change-Id: I039fb02bb95270e641bc4442204e658735859d43 * add ci for riscv64-spacemit-ime-native Change-Id: I711c1033061df1a289ea77891b2997599dfe8279 * update debian-13-riscv64-spacemit-ime-native ci label Change-Id: Ifb2b891e2fca57b5da604fce2ac255f27731179a * remove license comment for spacemit ime Change-Id: If0dc3ca30a958631ccca0a28b62e0b825f9fb0c3 * upgrade binutils for gcc ime Change-Id: Ibf2fa74c1064408974cb5b45f044d40987e5fb45 * add spacemit ime cross jobs Change-Id: I80d74909941d41cb9cd09e51d8baf01c985cbfc6 * remove native compile for riscv64-spacemit-ime Change-Id: I01920afafdc73fa7424014fd648d243f8ec9e25e * ci : add caching for spacemit ime cross toolchain Change-Id: Ic54a192019a2fd982bbd58225ce3bbc38f4053de * ci: bug fixed for cache path and env Change-Id: I28c42e10b6fff053bb6580926ca2353448cb042a * Update .github/workflows/build-linux-cross.yml for cache path Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * bugfixed for build-linux-cross.yml, syntax error Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: cailinxi <linxi.cai@spacemit.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-09-29 17:50:44 +03:00
Daniel Bevenius	24a6734daf	ggml-cpu : add check for ARM MATMUL_INT8/i8mm support (#15922 ) This commit adds a check for GGML_MACHINE_SUPPORTS_i8mm when enabling MATMUL_INT8 features, ensuring that i8mm intrinsics are only used when the target hardware actually supports them. The motivation for this is to fix ggml CI build failures where the feature detection correctly identifies that i8mm is not supported, adding the +noi8mm flag, but MATMUL_INT8 preprocessor definitions are still enabled, causing the compiler to attempt to use vmmlaq_s32 intrinsics without i8mm support. Refs: https://github.com/ggml-org/ggml/actions/runs/17525174120/job/49909199499	2025-09-11 14:39:12 +01:00
Aaron Teo	186415d595	ggml-cpu: drop support for nnpa intrinsics (#15821 )	2025-09-06 11:27:28 +08:00
xctan	05c0380f2a	ggml-cpu : optimize RVV kernels (#15720 ) * ggml-cpu : optimize rvv ggml_vec_dot_f32 * ggml-cpu : optimize 128-bit rvv ggml_vec_dot_q4_K_q8_K * ggml-cpu : fix riscv arch flags * ggml-cpu : add more rvv ops * ggml-cpu : optimize rvv ggml_vec_dot_q4_K_q8_K * ggml-cpu : optimize rvv ggml_vec_dot_q6_K_q8_K * ggml-cpu : minor rvv adjustments * ggml-cpu : fix riscv include	2025-09-03 16:16:21 +08:00
Charles Xu	4d74393bcc	ggml: update kleidiai to v1.13.0 (#15663 )	2025-08-31 00:03:42 +08:00
xctan	1cf123a343	ggml-cpu : add basic RVV support for vector f32 ops (#15057 ) * ggml-cpu : add basic RVV support for vector f32 ops * ggml-cpu : add RVV support for f32 softmax	2025-08-27 16:44:22 +08:00
Aaron Teo	ff27f80a74	ggml: initial IBM zDNN backend (#14975 ) * ggml-zdnn: inital backend impl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: temp change z17 to arch15 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: fix build bugs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: tensor->extra logging check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: add layout name mapping, ztensor information Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: separate logging into its own line Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: add shape comparison Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: add ggml_tensor shape log Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> ggml-zdnn: fix incorrect shape logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add output buffer check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: run compute and store into tensor->extra Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add set_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add more loggers Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update set_tensor logging to check only for matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: last working matmul version Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add comments to prevent accidentally deleting lines Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: support op out_prod Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update op out_prod to use tensor->extra Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rewrite the backend implementation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bugfix new impl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix compiler warnings and bugfixes Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: test ztensor finding in init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: implement at least 1 op to test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: assign tensor->extra to buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add check for view tensors to prevent init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rework init_tensor to create new buffers Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to std vector instead of array Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch buffers back and set to arbitrary number Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: impl init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update supports_op matmul matrix Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix incorrect ztensor shape, reduce memory padding Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: impl matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix compiler error missing type Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing data transform call Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add bias init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: tighten memory usage, change string allocation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add bias ztensor and data free Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add bias data transform Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add more debug info for extra buffer transform Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add logger to check if mat mul ops go through set_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: activate bias transform in matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move weights transform into mulmat Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add more safeguards in matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix sequencing of transforms Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bugfix transform ztensor vs origtensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: figure out why sigtrap is happening Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix sigsegv Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move everything back to local declaration Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: move bias data to local also Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bring back working matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: rewrite into mre Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing vector import Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing vector import in header Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt to fix sigsegv Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing load tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix invalid ztensor buffer release Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add logging to debug free buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: remove free_buffer debug info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add parmblkformat detections Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add nnpa installed detection Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add zdnn_init call for static libs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at fixing invalid buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: switch to using deque to fix pointer deref problem Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add weights logging to check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt to use unique ptr Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add tensor to pre_tfm_desc logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add inputs logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable op_none initialisation for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix missing return from init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: load ztensors in cgraph exec Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: work on moving output ztensor as well Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable logging and breakpoints for full test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at manually changing the layout Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at using default nwhc format instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable global load ztensor for now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix errorenous output load tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: add guards to prevent loading ztensor if transformed Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: bring load ztensor back to init routine Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: code clean up Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix ztensor deallocation abort stabilise ggml <-> zdnn api Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: clean up matmul selection Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: clean up project structure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: update documentation, prepare for upstream Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * chore: add codeowners Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: disable batched matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: attempt at fixing tensor views during matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: deny all view tensors directly Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix pr comments Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update ops docs for zdnn Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: redo test-backend-ops for ops.md Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-zdnn: fix typo in build-s390x.md Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * codeowners: remove taronaeo for now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "codeowners: remove taronaeo for now" This reverts commit `411ea4ed78`. * ggml-zdnn: remove unused ggml_zdnn macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-08-15 21:11:22 +08:00
Aaron Teo	c7f3169cd5	ggml-cpu : disable GGML_NNPA by default due to instability (#14880 ) * docs: update s390x document for sentencepiece Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `e086c5e3a7`) * docs: update huggingface links + reword Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `8410b085ea`) * ggml-cpu: disable ggml-nnpa compile flag by default fixes #14877 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `412f4c7c88`) * docs: update s390x build docs to reflect nnpa disable Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `c1eeae1d0c`) --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-25 19:09:03 +02:00
Kai Pastor	60f816a79d	cmake : fix usage issues (ggml/1257) * CMake config: Create target only once Fix error on repeated find_package(ggml). For simplicity, check only for the top-level ggml::ggml. * CMake config: Add CUDA link libs * CMake config: Add OpenCL link libs * CMake config: Use canonical find_dependency Use set and append to control link lib variables. Apply more $<LINK_ONLY...>. * CMake config: Wire OpenMP dependency	2025-07-24 20:27:23 +03:00
Charles Xu	922042601b	kleidiai: add support for get_rows (#14676 ) * kleidiai: add support for get_rows * apply fixes based on code review * apply more fixes based on code review	2025-07-21 16:49:52 +03:00
Romain Biessy	a7417f5594	ggml-cpu: sycl: Re-enable exp f16 (#14462 )	2025-06-30 14:52:02 +02:00
xiaobing318	c839a2da1a	cmake : Remove redundant include path in CMakeLists.txt (#14452 ) * Update docker.yml 修改docker.yml文件中的内容使其停止周期性的运行该workflow，如果想要运行该workflow可以手动启动 * Remove redundant include path in CMakeLists.txt The parent directory '..' was removed from the include directories for the ggml-cpu-feats target, to avoid unnecessary include paths. * Enable scheduled Docker image builds Uncomments the workflow schedule to trigger daily Docker image rebuilds at 04:12 UTC, improving automation and keeping images up to date.	2025-06-30 12:48:24 +03:00
Aaron Teo	60ef23d6c1	ggml-cpu: enable IBM NNPA Vector Intrinsics (#14317 ) * ggml-cpu: add nnpa compile flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `4a9f60c201`) * ggml-cpu: add fp16->fp32 nnpa first Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `8d4a7987f9`) * ggml-cpu: add fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `0ff0d65162`) * ggml-cpu: better variable names Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `2f58bbcbb8`) * docs: update s390x docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `01b929491b`) * ggml-cpu: add debugging prints to see if dlf16 is correct Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix print vs printf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix float placeholder Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: ensure fp16 and fp32 load and stores are called Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fp16 load ensured to hit Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove sigint from fp16 store for some reason, the function is not getting a hit when debugged with gdb. we will need to investigate further Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: nnpa switch to vec_xst test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to vec_xst for 4 element loops also Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: rework noop Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove noop, general code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: clarify variable naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add breakpoint for debugging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: test fix for conversion failure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: disable fp32->fp16 nnpa conversions for now there are some conversion failures in nnpa that requires the eyes of an ibm stsm. will create a separate pr to introduce the fp32->fp16 change. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to elif macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: reattempt fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix typo Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: reattempt fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix compiler types Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: change to typedef vector types Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add 4 element loops for fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: clarified vector naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back fp32->fp16 store nnpa Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add nnpa macro check in ggml-impl Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add missing __func__ Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: diagnose why __NNPA__ macro is not being defined Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: import vecintrin.h to fix compiler errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: update macro tests Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move s390x typedef to own header file Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: move s390x typedef to own header file" This reverts commit `157f856c34`. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to importing ggml-cpu-impl instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix macro declaration Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: test more macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add debug prints Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bruteforce macro definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move macro definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add ggml-impl.h to cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to private macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move s390x typedef to own header file Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `157f856c34`) * ggml-cpu: move things around Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back compile macros Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: switch to quotes for import Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add compiler error macro Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add s390x detection in ggml-src Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back compile definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: undo cmakelists work Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: move s390x typedef to own header file" This reverts commit `18d79e1a30`. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove typedefs.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove typedef from cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add ggml-impl.h future notes Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: add todo comment for future reference Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: clarify naming of dlf16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove unnecessary target compile definitions Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * docs: update broken huggingface link for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix duplicate func names during compile Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: fix duplicate func names during compile" This reverts commit `fbb733451f`. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu" This reverts commit `bd288e8fa5`. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: refactor fp16<->fp32 simd to ggml-cpu Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix missing simd-mappings.h import in quants.c Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix missing simd-mappings.h within repack Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix amx mmq missing simd-mappings.h Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: attempt at fixing loongarch failing build Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move nnpa together with other fp16<->fp32 simd Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: fix wrong refactor of ggml-base ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: remove dependency on ggml-cpu from ggml-base Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: remove mistaken fallback macro fallback logic was already implemented but i was too sleepy to realise Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move ggml_table_f32_f16 to ggml-cpu ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures" This reverts commit `32a3533564`. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml: move ggml_table_f32_f16 to ggml-cpu" This reverts commit `9e40d984ad`. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml: move ggml_table_f32_f16 to ggml-cpu ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `9e40d984ad`) * ggml: move ggml_table_f32_f16 to ggml-cpu.c Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: extern c ggml_table_f32_f16 + chore docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h we rely on the variable declaration in ggml-cpu.c instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h" This reverts commit `f71b21d2f7`. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * ggml-cpu: bring back ggml_table_f32_f16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "ggml-cpu: bring back ggml_table_f32_f16" This reverts commit `2dce119178`. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * fix ggml time initialization * fix f32_f16 table init * remove extra line --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: slaren <slarengh@gmail.com>	2025-06-25 23:49:04 +02:00
Christian Kastner	6369be0735	Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286 ) * Add PowerPC feature detection and scoring * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC * ggml-cpu: Delay some initializations until function is called When using GGML_BACKEND_DL=ON, these initializations might use instructions that are not supported by the current CPU. --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-06-20 14:17:32 +02:00
Charles Xu	9230dbe2c7	ggml: Update KleidiAI to v1.9.0 (#14277 )	2025-06-20 10:51:01 +03:00
Charles Xu	ef035803eb	ggml: Add Apple support for GGML_CPU_ALL_VARIANTS (#14258 )	2025-06-18 12:40:07 +01:00
Charles Xu	3ba0d843c6	ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206 )	2025-06-16 11:47:57 +02:00
Christian Kastner	532802f938	Implement GGML_CPU_ALL_VARIANTS for ARM (#14080 ) * ggml-cpu: Factor out feature detection build from x86 * ggml-cpu: Add ARM feature detection and scoring This is analogous to cpu-feats-x86.cpp. However, to detect compile-time activation of features, we rely on GGML_USE_<FEAT> which need to be set in cmake, instead of GGML_<FEAT> that users would set for x86. This is because on ARM, users specify features with GGML_CPU_ARM_ARCH, rather than with individual flags. * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM Like x86, however to pass around arch flags within cmake, we use GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>. Some features are optional, so we may need to build multiple backends per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring function sort out which one can be used. * ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now The other platforms will need their own specific variants. This also fixes the bug that the the variant-building branch was always being executed as the else-branch of GGML_NATIVE=OFF. The branch is moved to an elseif-branch which restores the previous behavior.	2025-06-11 21:07:44 +02:00
xctan	f470bc36be	ggml-cpu : split arch-specific implementations (#13892 ) * move ggml-cpu-aarch64 to repack * split quantize_row_q8_0/1 * split helper functions * split ggml_vec_dot_q4_0_q8_0 * split ggml_vec_dot_q4_1_q8_1 * split ggml_vec_dot_q5_0_q8_0 * split ggml_vec_dot_q5_1_q8_1 * split ggml_vec_dot_q8_0_q8_0 * split ggml_vec_dot_tq1_0_q8_K * split ggml_vec_dot_tq2_0_q8_K * split ggml_vec_dot_q2_K_q8_K * split ggml_vec_dot_q3_K_q8_K * split ggml_vec_dot_q4_K_q8_K * split ggml_vec_dot_q5_K_q8_K * split ggml_vec_dot_q6_K_q8_K * split ggml_vec_dot_iq2_xxs_q8_K * split ggml_vec_dot_iq2_xs_q8_K * split ggml_vec_dot_iq2_s_q8_K * split ggml_vec_dot_iq3_xxs_q8_K * split ggml_vec_dot_iq3_s_q8_K * split ggml_vec_dot_iq1_s_q8_K * split ggml_vec_dot_iq1_m_q8_K * split ggml_vec_dot_iq4_nl_q8_0 * split ggml_vec_dot_iq4_xs_q8_K * fix typos * fix missing prototypes * rename ggml-cpu-quants.c * rename ggml-cpu-traits * rename arm folder * move cpu-feats-x86.cpp * rename ggml-cpu-hbm * update arm detection macro in quants.c * move iq quant tables * split ggml_quantize_mat_q8_0/K * split ggml_gemv_* * split ggml_gemm_* * rename namespace aarch64 to repack * use weak aliases to replace test macros * rename GGML_CPU_AARCH64 to GGML_CPU_REPACK * rename more aarch64 to repack * clean up rebase leftover * fix compilation errors * remove trailing spaces * try to fix clang compilation errors * try to fix clang compilation errors again * try to fix clang compilation errors, 3rd attempt * try to fix clang compilation errors, 4th attempt * try to fix clang compilation errors, 5th attempt * try to fix clang compilation errors, 6th attempt * try to fix clang compilation errors, 7th attempt * try to fix clang compilation errors, 8th attempt * try to fix clang compilation errors, 9th attempt * more cleanup * fix compilation errors * fix apple targets * fix a typo in arm version of ggml_vec_dot_q4_K_q8_K Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-06-09 16:47:13 +02:00
shalinib-ibm	093e3f1feb	cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966 ) Some systems report the CPU implementation as "Power11" instead of "POWER11". The existing CMake logic uses a case-sensitive regular expression to extract the CPU generation, which fails when the casing doesn't exactly match "POWER". This patch provides a fix by first converting the string to uppercase before applying the regex. Signed-off-by: root <root@rheldb2v.pperf.tadn.ibm.com> Co-authored-by: root <root@rheldb2v.pperf.tadn.ibm.com>	2025-06-02 15:18:36 +03:00
Christian Kastner	21fcc21ad5	cmake: Factor out CPU architecture detection (#13883 ) * cmake: Define function for querying architecture The tests and results match exactly those of ggml/src/CMakeLists.txt * Switch arch detection over to new function	2025-05-29 12:50:25 +02:00
xctan	05f6ac6283	ggml : riscv: add xtheadvector support (#13720 ) * ggml : riscv: add xtheadvector support * ggml : clean up some macro usage	2025-05-27 16:21:36 +03:00
Christian Kastner	7fe03e7446	ggml-cpu: x86 feature detection is specific to x86 (#13811 )	2025-05-27 13:18:39 +02:00
Dan Johansson	4f711afed5	ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509 ) Signed-off-by: Dan Johansson <dan.johansson@arm.com>	2025-05-13 18:02:28 +03:00
Dan Johansson	a71a4075cd	ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053 ) * ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel Signed-off-by: Dan Johansson <dan.johansson@arm.com> * * code review fixes Signed-off-by: Dan Johansson <dan.johansson@arm.com> * * adds a comment that clarifies barrier usage Signed-off-by: Dan Johansson <dan.johansson@arm.com> --------- Signed-off-by: Dan Johansson <dan.johansson@arm.com> Co-authored-by: Charles Xu <charles.xu@arm.com>	2025-05-12 13:06:19 +02:00
Aaron Teo	44cd8d91ff	feat(ggml-cpu): enable z17 compile (#13182 ) z17 compilation requires GCC 15.1.0 and onwards Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-04-30 10:47:35 +01:00
Diego Devesa	1d735c0b4f	ggml : add SSE 4.2 and x64 base variant for CPUs without AVX (#12871 ) * ggml : add SSE 4.2 variant for CPUs without AVX * ggml : add x64 base ABI variant	2025-04-21 18:13:51 +02:00
cmdr2	995083e4ed	cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167) * cpu: refactor SIMD mappings and vectorized op functions into separate files * Fix warning for ggml_float to float * Fix warnings * cpu: move all the operations (except mul_mat) to a separate c++ file * fix whitespace * Update ggml/src/ggml-cpu/vec.h Co-authored-by: Diego Devesa <slarengh@gmail.com> * Fix PR comments - use GGML_UNUSED, use cassert in ops.cpp * Reverse the order of import for ops.h and vec.h, to match what was present in ggml-cpu.c previously --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-04-07 18:44:17 +03:00
cmdr2	a62d7fa7a9	cpu: de-duplicate some of the operators and refactor (ggml/1144) * cpu: de-duplicate some of the operators and refactor * Fix PR comments * Fix PR comments	2025-03-30 08:33:31 +03:00
Georgi Gerganov	0306aad1ca	cmake : sync/merge PowerPC build commands (#0 )	2025-03-27 09:04:38 +02:00

1 2

76 Commits