llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aaron Teo	4f017d718a	ggml-cpu: test fix for conversion failure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:55:16 +08:00
Aaron Teo	5424d9e757	ggml-cpu: add breakpoint for debugging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:51:05 +08:00
Aaron Teo	bb9345ca8a	ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:50:05 +08:00
Aaron Teo	e0f8fb930b	ggml-cpu: clarify variable naming Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:43:41 +08:00
Aaron Teo	27b4c3f338	ggml-cpu: remove noop, general code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:41:39 +08:00
Aaron Teo	8312adc980	ggml-cpu: rework noop Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:24:32 +08:00
Aaron Teo	6d507bbeb0	ggml-cpu: switch to vec_xst for 4 element loops also Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:23:23 +08:00
Aaron Teo	f9f6c7e897	ggml-cpu: nnpa switch to vec_xst test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:16:35 +08:00
Aaron Teo	6a25fd8531	ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:10:44 +08:00
Aaron Teo	ebc1d19f62	ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 16:01:55 +08:00
Aaron Teo	9330454cb8	ggml-cpu: remove sigint from fp16 store for some reason, the function is not getting a hit when debugged with gdb. we will need to investigate further Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 15:06:31 +08:00
Aaron Teo	575ea9f6c6	ggml-cpu: fp16 load ensured to hit Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 15:00:46 +08:00
Aaron Teo	8f3a5af6c0	ggml-cpu: ensure fp16 and fp32 load and stores are called Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:57:25 +08:00
Aaron Teo	94f10ca189	ggml-cpu: fix float placeholder Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:53:15 +08:00
Aaron Teo	d9cc63a94a	ggml-cpu: fix print vs printf Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:51:38 +08:00
Aaron Teo	48b820d05f	ggml-cpu: add debugging prints to see if dlf16 is correct Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-06-21 14:50:33 +08:00
Aaron Teo	0394a006c5	docs: update s390x docs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `01b929491b`)	2025-06-21 14:48:46 +08:00
Aaron Teo	ffe296457e	ggml-cpu: better variable names Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `2f58bbcbb8`)	2025-06-21 14:47:46 +08:00
Aaron Teo	ebf9f34a38	ggml-cpu: add fp32->fp16 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `0ff0d65162`)	2025-06-21 14:47:23 +08:00
Aaron Teo	45a4cf651c	ggml-cpu: add fp16->fp32 nnpa first Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `8d4a7987f9`)	2025-06-21 14:47:12 +08:00
Aaron Teo	5801806f70	ggml-cpu: add nnpa compile flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> (cherry picked from commit `4a9f60c201`)	2025-06-21 14:46:41 +08:00
Markus Tavenrath	bb16041cae	Add support for VK_EXT_debug_utils to add labels to Vulkan objects. (#13792 ) * Add support for VK_EXT_debug_utils to add labels to Vulkan objects. In step 1 compute pipelines are getting labeled. * remove #ifdef for debug utils and add queue marker.	2025-06-21 08:17:12 +02:00
Sigbjørn Skjæret	58cba76a9a	gguf-py : fix TemplateProcessing pair when bos/eos is missing (#14312 )	2025-06-21 07:33:21 +02:00
Georgi Gerganov	67ae5312e2	metal : fix thread-safety (#14300 ) ggml-ci	2025-06-21 08:04:18 +03:00
Georgi Gerganov	692e3cdd0a	memory : rename interface to llama_memory_context_i (#14296 ) * memory : rename interface to llama_memory_context_i ggml-ci * cont : fix comments * cont : use "mctx" for referencing a memory context ggml-ci	2025-06-21 08:03:46 +03:00
Daniel Han	b23fa0b3f4	convert : fix Llama 4 conversion (#14311 )	2025-06-21 06:32:01 +02:00
Georgi Gerganov	06cbedfca1	sync : ggml ggml-ci	2025-06-20 21:02:47 +03:00
Acly	b7147673f2	Add `ggml_roll` (ggml/1274) * ggml : add ggml_roll * use set/get_op_params & std::min	2025-06-20 21:02:47 +03:00
David Chiu	d860dd99a4	docs : fix the link to llama.h (#14293 )	2025-06-20 19:43:35 +02:00
Aman Gupta	c959f462a0	CUDA: add conv_2d_transpose (#14287 ) * CUDA: add conv_2d_transpose * remove direct include of cuda_fp16 * Review: add brackets for readability, remove ggml_set_param and add asserts	2025-06-20 22:48:24 +08:00
Sigbjørn Skjæret	22015b2092	lint : remove trailing whitepace (#14304 )	2025-06-20 16:37:44 +02:00
Ruikai Peng	dd6e6d0b6a	vocab : prevent tokenizer overflow (#14301 ) * vocab : prevent stack overflow in tokenize * vocab : return error instead of aborting on oversized token count * vocab : INT32_MIN from llama_tokenize on overflow	2025-06-20 07:13:06 -07:00
Nicolò Scipione	8308f98c7f	sycl: add usage of enqueue_functions extension (#14244 ) * Add header and namespace to use enqueue_functions extension * Convert submit and parallel_for to use new extension in convert.cpp * Convert submit and parallel_for to use extension in ggml-sycl.cpp * Convert submit and parallel_for to use extension in gla.cpp * Convert submit and parallel_for in mmq.cpp * Convert submit and parallel_for in mmvq.cpp * Convert submit and parallel_for in remaining files * Convert all simple parallel_for to nd_launch from enqueue_functions extension * Wrapping extension in general function Create a general function that enable the enqueue_functions extension if it is enable in the compiler, otherwise call the general SYCL function to launch kernels. --------- Signed-off-by: nscipione <nicolo.scipione@codeplay.com>	2025-06-20 15:07:21 +02:00
Christian Kastner	6369be0735	Implement GGML_CPU_ALL_VARIANTS for PowerPC (#14286 ) * Add PowerPC feature detection and scoring * ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for PowerPC * ggml-cpu: Delay some initializations until function is called When using GGML_BACKEND_DL=ON, these initializations might use instructions that are not supported by the current CPU. --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-06-20 14:17:32 +02:00
Sigbjørn Skjæret	88fc854b4b	llama : improve sep token handling (#14272 )	2025-06-20 14:04:09 +02:00
Diego Devesa	e28c1b93fd	cuda : synchronize graph capture and cublas handle destruction (#14288 ) Workarounds an issue that may cause CUDA graph capture to fail when a cuBLAS handle is destroyed in a different thread	2025-06-20 13:57:36 +02:00
Georgi Gerganov	d27b3ca175	ggml : fix repack work size for mul_mat_id (#14292 ) ggml-ci	2025-06-20 11:19:15 +03:00
Charles Xu	9230dbe2c7	ggml: Update KleidiAI to v1.9.0 (#14277 )	2025-06-20 10:51:01 +03:00
Georgi Gerganov	812939a9e9	model : more uniform output id handling (#14275 ) * model : more uniform output id handling ggml-ci * cont : revert n_outputs < n_tokens optimization ggml-ci * cont : fix out_ids initialization ggml-ci	2025-06-20 10:50:27 +03:00
Georgi Gerganov	4c9fdfbe15	ubatch : new splitting logic (#14217 ) ggml-ci	2025-06-20 10:14:14 +03:00
Aman Gupta	9eaa51e7f0	CUDA: add conv_2d_dw (#14265 ) * CUDA: add conv_2d_dw * better naming * simplify using template * Review: fix operation ordering in ggml-cuda, use __forceinline__, use more const	2025-06-20 09:50:24 +08:00
Diego Devesa	8f71d0f3e8	ggml-cpu : remove unnecesary arm feature detection (#14281 ) Support for Arm runtime feature detection has now been added to GGML_CPU_ALL_VARIANTS. This removes the old and not very functional code.	2025-06-19 21:24:14 +02:00
Alex Trotta	381174bbda	gguf-py : make sentencepiece optional (#14200 ) * Make sentencepiece optional * Bump to 0.18.0 * Bump patch instead of minor Co-authored-by: compilade <git@compilade.net> --------- Co-authored-by: compilade <git@compilade.net>	2025-06-19 15:56:12 +02:00
aa956	d67341dc18	server : add server parameters for draft model cache type (#13782 ) Co-authored-by: aa956 <27946957+aa956@users.noreply.github.com>	2025-06-19 16:01:03 +03:00
fanyang	456af35eb7	build : suppress gcc15 compile warnings (#14261 ) * Change _contains_any() substrs to std::string_view and fix the find comparison logic.	2025-06-19 14:49:48 +02:00
Anton Mitkov	600e3e9b50	sycl: Cleanup codepaths in Get Rows in sycl backend (#14215 ) Addresses unused reorder path	2025-06-19 11:40:21 +01:00
bashayer hijji	fffcce535e	llama-bench : add --no-warmup flag (#14224 ) (#14270 ) Add no_warmup parameter to cmd_params struct and command-line parsing to allow users to skip warmup runs before benchmarking. - Add no_warmup boolean field to cmd_params struct - Add --no-warmup command-line argument parsing - Add help text documentation for the new flag - Wrap existing warmup logic in conditional check - Maintain full backward compatibility (warmup enabled by default) Addresses #14224	2025-06-19 12:24:12 +02:00
pqnet	5fc7856815	convert : fix remote option in Windows (#14100 )	2025-06-19 12:21:40 +02:00
Aaron Teo	faed5a5f5d	llamafile : support s390x SIMD instruction set (#14273 )	2025-06-19 11:48:54 +02:00
0cc4m	10bb545c5b	Vulkan: Set device max size for host memory to avoid OOM warning and fallback to CPU buffer (#14249 )	2025-06-19 09:15:42 +02:00

1 2 3 4 5 ...

5752 Commits All Branches Search

5752 Commits

All Branches