Commit Graph

5752 Commits

Author SHA1 Message Date
Aaron Teo 4f017d718a
ggml-cpu: test fix for conversion failure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 16:55:16 +08:00
Aaron Teo 5424d9e757
ggml-cpu: add breakpoint for debugging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 16:51:05 +08:00
Aaron Teo bb9345ca8a
ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 16:50:05 +08:00
Aaron Teo e0f8fb930b
ggml-cpu: clarify variable naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 16:43:41 +08:00
Aaron Teo 27b4c3f338
ggml-cpu: remove noop, general code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 16:41:39 +08:00
Aaron Teo 8312adc980
ggml-cpu: rework noop
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 16:24:32 +08:00
Aaron Teo 6d507bbeb0
ggml-cpu: switch to vec_xst for 4 element loops also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 16:23:23 +08:00
Aaron Teo f9f6c7e897
ggml-cpu: nnpa switch to vec_xst test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 16:16:35 +08:00
Aaron Teo 6a25fd8531
ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 16:10:44 +08:00
Aaron Teo ebc1d19f62
ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 16:01:55 +08:00
Aaron Teo 9330454cb8
ggml-cpu: remove sigint from fp16 store
for some reason, the function is not getting a hit when debugged with
    gdb. we will need to investigate further

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 15:06:31 +08:00
Aaron Teo 575ea9f6c6
ggml-cpu: fp16 load ensured to hit
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 15:00:46 +08:00
Aaron Teo 8f3a5af6c0
ggml-cpu: ensure fp16 and fp32 load and stores are called
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 14:57:25 +08:00
Aaron Teo 94f10ca189
ggml-cpu: fix float placeholder
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 14:53:15 +08:00
Aaron Teo d9cc63a94a
ggml-cpu: fix print vs printf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 14:51:38 +08:00
Aaron Teo 48b820d05f
ggml-cpu: add debugging prints to see if dlf16 is correct
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-21 14:50:33 +08:00
Aaron Teo 0394a006c5
docs: update s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 01b929491b)
2025-06-21 14:48:46 +08:00
Aaron Teo ffe296457e
ggml-cpu: better variable names
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 2f58bbcbb8)
2025-06-21 14:47:46 +08:00
Aaron Teo ebf9f34a38
ggml-cpu: add fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 0ff0d65162)
2025-06-21 14:47:23 +08:00
Aaron Teo 45a4cf651c
ggml-cpu: add fp16->fp32 nnpa first
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 8d4a7987f9)
2025-06-21 14:47:12 +08:00
Aaron Teo 5801806f70
ggml-cpu: add nnpa compile flag
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 4a9f60c201)
2025-06-21 14:46:41 +08:00
Markus Tavenrath bb16041cae
Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792)
* Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled.

* remove #ifdef for debug utils and add queue marker.
2025-06-21 08:17:12 +02:00
Sigbjørn Skjæret 58cba76a9a
gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312) 2025-06-21 07:33:21 +02:00
Georgi Gerganov 67ae5312e2
metal : fix thread-safety (#14300)
ggml-ci
2025-06-21 08:04:18 +03:00
Georgi Gerganov 692e3cdd0a
memory : rename interface to llama_memory_context_i (#14296)
* memory : rename interface to llama_memory_context_i

ggml-ci

* cont : fix comments

* cont : use "mctx" for referencing a memory context

ggml-ci
2025-06-21 08:03:46 +03:00
Daniel Han b23fa0b3f4
convert : fix Llama 4 conversion (#14311) 2025-06-21 06:32:01 +02:00
Georgi Gerganov 06cbedfca1 sync : ggml
ggml-ci
2025-06-20 21:02:47 +03:00
Acly b7147673f2 Add `ggml_roll` (ggml/1274)
* ggml : add ggml_roll

* use set/get_op_params & std::min
2025-06-20 21:02:47 +03:00
David Chiu d860dd99a4
docs : fix the link to llama.h (#14293) 2025-06-20 19:43:35 +02:00
Aman Gupta c959f462a0
CUDA: add conv_2d_transpose (#14287)
* CUDA: add conv_2d_transpose

* remove direct include of cuda_fp16

* Review: add brackets for readability, remove ggml_set_param and add asserts
2025-06-20 22:48:24 +08:00
Sigbjørn Skjæret 22015b2092
lint : remove trailing whitepace (#14304) 2025-06-20 16:37:44 +02:00
Ruikai Peng dd6e6d0b6a
vocab : prevent tokenizer overflow (#14301)
* vocab : prevent stack overflow in tokenize

* vocab : return error instead of aborting on oversized token count

* vocab : INT32_MIN from llama_tokenize on overflow
2025-06-20 07:13:06 -07:00
Nicolò Scipione 8308f98c7f
sycl: add usage of enqueue_functions extension (#14244)
* Add header and namespace to use enqueue_functions extension

* Convert submit and parallel_for to use new extension in convert.cpp

* Convert submit and parallel_for to use extension in ggml-sycl.cpp

* Convert submit and parallel_for to use extension in gla.cpp

* Convert submit and parallel_for in mmq.cpp

* Convert submit and parallel_for in mmvq.cpp

* Convert submit and parallel_for in remaining files

* Convert all simple parallel_for to nd_launch from enqueue_functions
extension

* Wrapping extension in general function

Create a general function that enable the enqueue_functions extension if
it is enable in the compiler, otherwise call the general SYCL function
to launch kernels.

---------

Signed-off-by: nscipione <nicolo.scipione@codeplay.com>
2025-06-20 15:07:21 +02:00
Christian Kastner 6369be0735
Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286)
* Add PowerPC feature detection and scoring

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC

* ggml-cpu: Delay some initializations until function is called

When using GGML_BACKEND_DL=ON, these initializations might use
instructions that are not supported by the current CPU.

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-06-20 14:17:32 +02:00
Sigbjørn Skjæret 88fc854b4b
llama : improve sep token handling (#14272) 2025-06-20 14:04:09 +02:00
Diego Devesa e28c1b93fd
cuda : synchronize graph capture and cublas handle destruction (#14288)
Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread
2025-06-20 13:57:36 +02:00
Georgi Gerganov d27b3ca175
ggml : fix repack work size for mul_mat_id (#14292)
ggml-ci
2025-06-20 11:19:15 +03:00
Charles Xu 9230dbe2c7
ggml: Update KleidiAI to v1.9.0 (#14277) 2025-06-20 10:51:01 +03:00
Georgi Gerganov 812939a9e9
model : more uniform output id handling (#14275)
* model : more uniform output id handling

ggml-ci

* cont : revert n_outputs < n_tokens optimization

ggml-ci

* cont : fix out_ids initialization

ggml-ci
2025-06-20 10:50:27 +03:00
Georgi Gerganov 4c9fdfbe15
ubatch : new splitting logic (#14217)
ggml-ci
2025-06-20 10:14:14 +03:00
Aman Gupta 9eaa51e7f0
CUDA: add conv_2d_dw (#14265)
* CUDA: add conv_2d_dw

* better naming

* simplify using template

* Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const
2025-06-20 09:50:24 +08:00
Diego Devesa 8f71d0f3e8
ggml-cpu : remove unnecesary arm feature detection (#14281)
Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.
2025-06-19 21:24:14 +02:00
Alex Trotta 381174bbda
gguf-py : make sentencepiece optional (#14200)
* Make sentencepiece optional

* Bump to 0.18.0

* Bump patch instead of minor

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: compilade <git@compilade.net>
2025-06-19 15:56:12 +02:00
aa956 d67341dc18
server : add server parameters for draft model cache type (#13782)
Co-authored-by: aa956 <27946957+aa956@users.noreply.github.com>
2025-06-19 16:01:03 +03:00
fanyang 456af35eb7
build : suppress gcc15 compile warnings (#14261)
* Change _contains_any() substrs to std::string_view and fix the find comparison logic.
2025-06-19 14:49:48 +02:00
Anton Mitkov 600e3e9b50
sycl: Cleanup codepaths in Get Rows in sycl backend (#14215)
Addresses unused reorder path
2025-06-19 11:40:21 +01:00
bashayer hijji fffcce535e
llama-bench : add --no-warmup flag (#14224) (#14270)
Add no_warmup parameter to cmd_params struct and command-line parsing to allow users to skip warmup runs before benchmarking.

- Add no_warmup boolean field to cmd_params struct

- Add --no-warmup command-line argument parsing

- Add help text documentation for the new flag

- Wrap existing warmup logic in conditional check

- Maintain full backward compatibility (warmup enabled by default)

Addresses #14224
2025-06-19 12:24:12 +02:00
pqnet 5fc7856815
convert : fix remote option in Windows (#14100) 2025-06-19 12:21:40 +02:00
Aaron Teo faed5a5f5d
llamafile : support s390x SIMD instruction set (#14273) 2025-06-19 11:48:54 +02:00
0cc4m 10bb545c5b
Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer (#14249) 2025-06-19 09:15:42 +02:00