Aaron Teo
60ef23d6c1
ggml-cpu: enable IBM NNPA Vector Intrinsics ( #14317 )
...
* ggml-cpu: add nnpa compile flag
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 4a9f60c201 )
* ggml-cpu: add fp16->fp32 nnpa first
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 8d4a7987f9 )
* ggml-cpu: add fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 0ff0d65162 )
* ggml-cpu: better variable names
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 2f58bbcbb8 )
* docs: update s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 01b929491b )
* ggml-cpu: add debugging prints to see if dlf16 is correct
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix print vs printf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix float placeholder
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: ensure fp16 and fp32 load and stores are called
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fp16 load ensured to hit
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove sigint from fp16 store
for some reason, the function is not getting a hit when debugged with
gdb. we will need to investigate further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: nnpa switch to vec_xst test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to vec_xst for 4 element loops also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: rework noop
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove noop, general code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarify variable naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add breakpoint for debugging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: test fix for conversion failure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: disable fp32->fp16 nnpa conversions for now
there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to elif macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix typo
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix compiler types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: change to typedef vector types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add 4 element loops for fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarified vector naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back fp32->fp16 store nnpa
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add nnpa macro check in ggml-impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add missing __func__
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: diagnose why __NNPA__ macro is not being defined
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: import vecintrin.h to fix compiler errors
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: update macro tests
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 157f856c34 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to importing ggml-cpu-impl instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix macro declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: test more macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add debug prints
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bruteforce macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add ggml-impl.h to cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to private macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 157f856c34 )
* ggml-cpu: move things around
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back compile macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to quotes for import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add compiler error macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add s390x detection in ggml-src
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: undo cmakelists work
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 18d79e1a30 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove typedefs.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove typedef from cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add ggml-impl.h future notes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add todo comment for future reference
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarify naming of dlf16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove unnecessary target compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update broken huggingface link for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix duplicate func names during compile
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: fix duplicate func names during compile"
This reverts commit fbb733451f .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu"
This reverts commit bd288e8fa5 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: refactor fp16<->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix missing simd-mappings.h import in quants.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix missing simd-mappings.h within repack
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix amx mmq missing simd-mappings.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: attempt at fixing loongarch failing build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move nnpa together with other fp16<->fp32 simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix wrong refactor of ggml-base
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: remove dependency on ggml-cpu from ggml-base
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove mistaken fallback macro
fallback logic was already implemented but i was too sleepy to realise
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures"
This reverts commit 32a3533564 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml: move ggml_table_f32_f16 to ggml-cpu"
This reverts commit 9e40d984ad .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 9e40d984ad )
* ggml: move ggml_table_f32_f16 to ggml-cpu.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: extern c ggml_table_f32_f16 + chore docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h
we rely on the variable declaration in ggml-cpu.c instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h"
This reverts commit f71b21d2f7 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back ggml_table_f32_f16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: bring back ggml_table_f32_f16"
This reverts commit 2dce119178 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* fix ggml time initialization
* fix f32_f16 table init
* remove extra line
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: slaren <slarengh@gmail.com>
2025-06-25 23:49:04 +02:00
Sigbjørn Skjæret
b193d53069
ggml : do not output unprintable characters on GGUF load failure ( #14381 )
2025-06-25 23:26:51 +02:00
Anton Mitkov
2bf9d539dd
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices ( #13973 )
2025-06-25 18:09:55 +02:00
lhez
73e53dc834
opencl: ref count `ggml_backend_opencl_context` and refactor profiling ( #14254 )
...
* Move profiling info into `ggml_backend_opencl_context`
* Add `enqueue_ndrange_kernel` to launch kernel
2025-06-24 11:46:25 -07:00
uvos
0142961a2e
CUDA/HIP: optimize mmv paths taken for HIP devices ( #14324 )
...
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-06-24 01:12:56 +02:00
Johannes Gäßler
defe2158dd
CUDA: mul_mat_v support for batch sizes > 1 ( #14262 )
...
* CUDA: mul_mat_v support for batch sizes > 1
* use 64 bit math for initial offset calculation
2025-06-23 13:11:31 +02:00
uvos
af3373f1ad
HIP: enable vec fattn on RDNA4 ( #14323 )
2025-06-22 16:51:23 +02:00
Aman Gupta
aa064b2eb7
CUDA: add mean operation ( #14313 )
...
* CUDA: add mean operation
* add back sum_rows_f32_cuda
* Review: early exit if col!=0
2025-06-22 12:39:54 +08:00
Markus Tavenrath
bb16041cae
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. ( #13792 )
...
* Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled.
* remove #ifdef for debug utils and add queue marker.
2025-06-21 08:17:12 +02:00
Georgi Gerganov
67ae5312e2
metal : fix thread-safety ( #14300 )
...
ggml-ci
2025-06-21 08:04:18 +03:00
Acly
b7147673f2
Add `ggml_roll` (ggml/1274)
...
* ggml : add ggml_roll
* use set/get_op_params & std::min
2025-06-20 21:02:47 +03:00
Aman Gupta
c959f462a0
CUDA: add conv_2d_transpose ( #14287 )
...
* CUDA: add conv_2d_transpose
* remove direct include of cuda_fp16
* Review: add brackets for readability, remove ggml_set_param and add asserts
2025-06-20 22:48:24 +08:00
Nicolò Scipione
8308f98c7f
sycl: add usage of enqueue_functions extension ( #14244 )
...
* Add header and namespace to use enqueue_functions extension
* Convert submit and parallel_for to use new extension in convert.cpp
* Convert submit and parallel_for to use extension in ggml-sycl.cpp
* Convert submit and parallel_for to use extension in gla.cpp
* Convert submit and parallel_for in mmq.cpp
* Convert submit and parallel_for in mmvq.cpp
* Convert submit and parallel_for in remaining files
* Convert all simple parallel_for to nd_launch from enqueue_functions
extension
* Wrapping extension in general function
Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.
---------
Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-06-20 15:07:21 +02:00
Christian Kastner
6369be0735
Implement GGML_CPU_ALL_VARIANTS for PowerPC ( #14286 )
...
* Add PowerPC feature detection and scoring
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC
* ggml-cpu: Delay some initializations until function is called
When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-06-20 14:17:32 +02:00
Diego Devesa
e28c1b93fd
cuda : synchronize graph capture and cublas handle destruction ( #14288 )
...
Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread
2025-06-20 13:57:36 +02:00
Georgi Gerganov
d27b3ca175
ggml : fix repack work size for mul_mat_id ( #14292 )
...
ggml-ci
2025-06-20 11:19:15 +03:00
Charles Xu
9230dbe2c7
ggml: Update KleidiAI to v1.9.0 ( #14277 )
2025-06-20 10:51:01 +03:00
Aman Gupta
9eaa51e7f0
CUDA: add conv_2d_dw ( #14265 )
...
* CUDA: add conv_2d_dw
* better naming
* simplify using template
* Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const
2025-06-20 09:50:24 +08:00
Diego Devesa
8f71d0f3e8
ggml-cpu : remove unnecesary arm feature detection ( #14281 )
...
Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.
2025-06-19 21:24:14 +02:00
fanyang
456af35eb7
build : suppress gcc15 compile warnings ( #14261 )
...
* Change _contains_any() substrs to std::string_view and fix the find comparison logic.
2025-06-19 14:49:48 +02:00
Anton Mitkov
600e3e9b50
sycl: Cleanup codepaths in Get Rows in sycl backend ( #14215 )
...
Addresses unused reorder path
2025-06-19 11:40:21 +01:00
Aaron Teo
faed5a5f5d
llamafile : support s390x SIMD instruction set ( #14273 )
2025-06-19 11:48:54 +02:00
0cc4m
10bb545c5b
Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer ( #14249 )
2025-06-19 09:15:42 +02:00
Georgi Gerganov
ed3290ab34
metal : add mean kernel ( #14267 )
...
* metal : add mean kernel
ggml-ci
* cont : dedup implementation
ggml-ci
2025-06-19 08:05:21 +03:00
Aaron Teo
50d2227953
ggml-cpu: reduce asm calls for hsum ( #14037 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-18 18:10:08 +01:00
Aaron Teo
6231c5cd6d
ggml-cpu: fix uncaught underscore terminators ( #14023 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-18 18:06:49 +01:00
Charles Xu
ef035803eb
ggml: Add Apple support for GGML_CPU_ALL_VARIANTS ( #14258 )
2025-06-18 12:40:07 +01:00
Daniel Bevenius
bbe98d2784
ggml : remove unused ggml_context_container (ggml/1272)
...
This commit removes the unused `ggml_context_container` structure from
the ggml library. It looks like the usage of this struct was removed in
Commit 4757fe18d56ec11bf9c07feaca6e9d5b5357e7f4 ("ggml : alloc
ggml_contexts on the heap (whisper/2525)").
The motivation for this changes is to improve code clarity/readability.
2025-06-18 09:59:21 +03:00
bandoti
c46503014d
cmake: remove shader-gen step-targets from ggml-vulkan ( #14226 )
...
* Remove step-targets from vulkan-shaders-gen
* Unset DESTDIR when building vulkan-shaders-gen
2025-06-17 22:33:25 +02:00
xctan
860a9e4eef
ggml-cpu : remove the weak alias trick ( #14221 )
2025-06-17 12:58:32 +03:00
R0CKSTAR
fe9d60e74a
musa: fix build warning (unused variable) ( #14231 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-06-17 17:48:08 +08:00
Diego Devesa
6adc3c3ebc
llama : add thread safety test ( #14035 )
...
* llama : add thread safety test
* llamafile : remove global state
* llama : better LLAMA_SPLIT_MODE_NONE logic
when main_gpu < 0 GPU devices are not used
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-16 08:11:43 -07:00
bandoti
0dbcabde8c
cmake: clean up external project logic for vulkan-shaders-gen ( #14179 )
...
* Remove install step for vulkan-shaders-gen
* Add install step to normalize msvc with make
* Regenerate modified shaders at build-time
2025-06-16 10:32:13 -03:00
uvos
7d6d91babf
HIP: disable rocwmma on gfx12 by default until rocm 7.0 ( #14202 )
2025-06-16 13:47:38 +02:00
Charles Xu
3ba0d843c6
ggml: Add Android support for GGML_CPU_ALL_VARIANTS ( #14206 )
2025-06-16 11:47:57 +02:00
Jeff Bolz
c89c2d1ab9
vulkan: mutex around vkQueueSubmit ( #14127 )
...
This fixes the remaining crash in test-thread-safety on my system.
2025-06-16 08:21:08 +02:00
xctan
3555b3004b
ggml-cpu : rework weak alias on apple targets ( #14146 )
...
* ggml-cpu : rework weak alias on apple targets
* fix powerpc detection
* fix ppc detection
* fix powerpc detection on darwin
2025-06-16 13:54:15 +08:00
uvos
e54b394082
CUDA/HIP: fix ssm_scan on devices where warp size is not 32 ( #14196 )
2025-06-15 17:30:13 +02:00
uvos
2c2caa4443
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRONT_SIZE__ ( #14183 )
2025-06-15 15:45:27 +02:00
Anton Mitkov
0889eba570
sycl: Adding additional cpy dbg print output ( #14034 )
2025-06-13 08:51:39 +01:00
Ewan Crawford
c61285e739
SYCL: Bump oneMath commit ( #14152 )
...
Update oneMath commit to merged PR https://github.com/uxlfoundation/oneMath/pull/669
which adds SYCL-Graph support for recording CUDA BLAS commands.
With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph
enabled. Prior to this change, an error would be thrown.
```
$ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2
UR CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: operator()
Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154
Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator()
SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code!
in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598
$HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
```
2025-06-13 08:45:37 +01:00
Anton Mitkov
ed52f3668e
sycl: Remove not needed copy f16->f32 for dnnl mul mat ( #14125 )
2025-06-12 15:15:11 +02:00
Georgi Gerganov
e2c0b6e46a
cmake : handle whitepsaces in path during metal build ( #14126 )
...
* cmake : handle whitepsaces in path during metal build
ggml-ci
* cont : proper fix
ggml-ci
---------
Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2025-06-12 10:14:24 +03:00
Christian Kastner
532802f938
Implement GGML_CPU_ALL_VARIANTS for ARM ( #14080 )
...
* ggml-cpu: Factor out feature detection build from x86
* ggml-cpu: Add ARM feature detection and scoring
This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.
This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM
Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.
Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.
* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now
The other platforms will need their own specific variants.
This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.
2025-06-11 21:07:44 +02:00
Jeff Bolz
bd248d4dc7
vulkan: Better thread-safety for command pools/buffers ( #14116 )
...
This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.
2025-06-11 09:48:52 -05:00
Jeff Bolz
1f7d50b293
vulkan: Track descriptor pools/sets per-context ( #14109 )
...
Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.
2025-06-11 07:19:25 +02:00
lhez
4c763c8d1b
opencl: add `mul_mv_id_q4_0_f32_8x_flat` ( #14003 )
2025-06-10 16:55:58 -07:00
Georgi Gerganov
b7ce1ad1e3
ggml : fix weak alias win32 (whisper/0)
...
ggml-ci
2025-06-10 18:39:33 +03:00
0cc4m
97340b4c99
Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend ( #14099 )
2025-06-10 13:01:33 +01:00
Isaac McFadyen
2bb0467043
rpc : nicer error messages for RPC server crash ( #14076 )
2025-06-10 09:41:01 +03:00