llama.cpp

Commit Graph

Author	SHA1	Message	Date
Piotr Wilkin	13a350fa1a	Whitespace	2026-02-16 22:39:12 +01:00
Piotr Wilkin	b0ed986aec	-> Refactor autoparser analyzer structure -> Fix content truncation -> Fix errors in capability detection due to non-empty assistant message -> Add missing debug prints for Jinja	2026-02-16 22:39:12 +01:00
Piotr Wilkin	2da282018e	Whoops	2026-02-16 22:39:12 +01:00
Piotr Wilkin	58d822ca0c	One more crazy spacing out	2026-02-16 22:39:12 +01:00
Piotr Wilkin	f8b0b75a00	Get rid of some crazy formatting	2026-02-16 22:39:12 +01:00
Piotr Wilkin	18054b4e44	Document helpers	2026-02-16 22:39:12 +01:00
Piotr Wilkin	ffdce9ca29	Add compilation guard to fix Windows compilation errors	2026-02-16 22:39:12 +01:00
Piotr Wilkin	5e38bac7a3	Post-merge adapt	2026-02-16 22:39:12 +01:00
Piotr Wilkin	ed82289609	Revert obsolete server-context change	2026-02-16 22:39:12 +01:00
Piotr Wilkin	964972d64e	Fix windows build	2026-02-16 22:39:12 +01:00
Piotr Wilkin	0a2090a8d6	Regenerate documentation	2026-02-16 22:39:12 +01:00
Piotr Wilkin	5164f2f3c8	Fix case with object inside object, refactor long methods.	2026-02-16 22:39:12 +01:00
Piotr Wilkin	8397fdddc6	Fix number partial parsing issue	2026-02-16 22:39:12 +01:00
Piotr Wilkin	5df5390c72	More edge cases	2026-02-16 22:39:12 +01:00
Piotr Wilkin	971b216ce1	Fix pesky issue on optional trailing arguments in function calls for TAGGED format	2026-02-16 22:39:11 +01:00
Piotr Wilkin	fcc61e6759	Remove [[noreturn]] as it causes compilation problems on Mac.	2026-02-16 22:39:11 +01:00
Piotr Wilkin	b223a7b1aa	We don't like segfaults (or failing tests).	2026-02-16 22:39:11 +01:00
Piotr Wilkin	4249e9889f	Fix minor regressions, add [[noreturn]] attrib	2026-02-16 22:39:11 +01:00
Piotr Wilkin	0abe32a3d8	Fix incorrect coercion of strings to non-string types during parsing	2026-02-16 22:39:11 +01:00
Piotr Wilkin	f1937febff	Feeding the hungry editor checker god.	2026-02-16 22:39:11 +01:00
Piotr Wilkin	c35b31abe5	Fix error in argument processing	2026-02-16 22:39:11 +01:00
Piotr Wilkin	5cabb3c737	Reverd bad change fix some templates and most tests	2026-02-16 22:39:11 +01:00
Piotr Wilkin	bb6337fb90	More robust reasoning detection	2026-02-16 22:39:11 +01:00
Piotr Wilkin	169a0fa0f6	Fix reasoning detection	2026-02-16 22:39:11 +01:00
Piotr Wilkin	2eedbb24e0	Quick vibe-coded fix for proper object printing	2026-02-16 22:39:11 +01:00
Piotr Wilkin	a4feadb10d	Missed this.	2026-02-16 22:39:11 +01:00
Piotr Wilkin	1e3d93cb6b	ANOTHER GIANT POST-FIXUP SQUISH	2026-02-16 22:39:11 +01:00
Piotr Wilkin	52d31fa024	THE GIANT AUTOPARSER SQUISH	2026-02-16 22:39:11 +01:00
Piotr Wilkin	052ad2ab8a	Make call IDs nine-character	2026-02-16 22:39:11 +01:00
Piotr Wilkin	47a7ebc0c1	Fix sanitizer warnings	2026-02-16 22:39:11 +01:00
Piotr Wilkin	b403c9aaa2	Fix bad typo	2026-02-16 22:39:11 +01:00
Piotr Wilkin	f2a4ae6ba8	Add workaround for templates requiring non-null content	2026-02-16 22:39:11 +01:00
AesSedai	d612901116	perplexity: add proper batching (#19661 )	2026-02-16 18:44:44 +02:00
Ivan Chikish	cceb1b4e33	common : inline functions (#18639 )	2026-02-16 17:52:24 +02:00
Judd	d23a55997d	ggml : make `ggml_is_view` as API (#19539 ) * make `ggml_is_view` as API * introduce `ggml_aux_is_view` as inline version for internal use. * change `ggml_aux_is_view` to `ggml_impl_is_view`	2026-02-16 17:43:34 +02:00
Saurabh Dash	5f28c53d11	model: Add support for Tiny Aya Models (#19611 ) * changes for tiny aya * changes to hash * changes to vocab * fix some tokenizer regex edge cases * update comment * add some comments for regex * Apply suggestion from @ngxson --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-02-16 16:28:46 +01:00
Adrien Gallouët	4408494144	build : rework llama_option_depr to handle LLAMA_CURL (#19658 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-16 16:06:48 +01:00
Mario Limonciello	2ba9adc093	Adjust workaround for ROCWMMA_FATTN/GFX9 to only newer ROCm veresions (#19591 ) Avoids issues with ROCm 6.4.4. Closes: https://github.com/ggml-org/llama.cpp/issues/19580 Fixes: `6845f7f87` ("Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (#19461)") Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2026-02-16 14:46:08 +01:00
Georgi Gerganov	cc45f2ada6	models : deduplicate delta-net graphs for Qwen family (#19597 ) * models : add llm_build_delta_net_base * cont : keep qwen35 and qwen35moe graphs intact * cont : add comments	2026-02-16 14:35:04 +02:00
Georgi Gerganov	d5dfc33027	graph : fix KQ mask, lora, cvec reuse checks (#19644 ) * graph : fix KQ mask reuse condition * cont : dedup KQ mask build and can_reuse * cont : fix build * graph : fix adapter check for reuse	2026-02-16 09:21:11 +02:00
abhijain1204fujitsu	267ba5a1d9	ggml: aarch64: Implement SVE in Gemm q4_k 8x8 q8_k Kernel (#19132 ) * Updated repack.cpp * Updated repack.cpp * Updated repack.cpp * Added if condition to support only vector length 256. * Changed the format removed comments and duplicate variable * If SVE 256 not present then was using generic function to compute, hence slowing the performance. So added code if SVE 256 is not present then use NEON code. * Code format change suggestion --------- Co-authored-by: Vithule, Prashant <Prashant.Vithule@fujitsu.com>	2026-02-16 14:38:43 +08:00
Georgi Gerganov	ff4affb4c1	sync : ggml	2026-02-15 22:24:29 +02:00
Georgi Gerganov	55d58599c8	ggml : bump version to 0.9.7 (ggml/1425)	2026-02-15 22:24:29 +02:00
Georgi Gerganov	1a8c700bfd	ggml : bump version to 0.9.6 (ggml/1423)	2026-02-15 22:24:29 +02:00
David Friehs	27b93cbd15	cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (#19624 ) * cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization - load all 8 int8 for a grid position in one load - calculate signs via popcnt instead of fetching from ksigns table - broadcast signs to drop individual shift/mask * cuda: iq2xxs: simplify sum scaling express `(sum * scale + sum / 2) / 4` as `(sum * (scale * 2 + 1)) / 8` express `((aux32 >> 28) * 2 + 1)` as `(aux32 >> 27 \| 1)` saves 3 registers for mul_mat_vec_q (152 -> 149) according to nsight AFAICT no overflow can occur here as iq2xxs values are far too small * uint -> uint32_t error: identifier "uint" is undefined	2026-02-15 22:38:42 +05:30
Aaron Teo	6e67fd2144	docs: update s390x build docs (#19643 )	2026-02-16 00:33:34 +08:00
Adrien Gallouët	9e118b97c4	build : remove LLAMA_HTTPLIB option (#19623 ) This option was introduced as a workaround because cpp-httplib could not build on visionOS. Since it has been fixed and now compiles on all platforms, we can remove it and simplify many things. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-15 15:38:50 +01:00
Daniel Bevenius	57088276d4	cmake : check if KleidiAI API has been fetched (#19640 ) This commit addresses a build issue with the KleidiAI backend when building multiple cpu backends. Commmit `3a00c98584` ("cmake : fix KleidiAI install target failure with EXCLUDE_FROM_ALL") introduced a change where FetchContent_Populate is called instead of FetchContent_MakeAvailable, where the latter does handle this case (it is idempotent but FetchContent_Populate is not). I missed this during my review and I should not have commited without verifying the CI failure, sorry about that.	2026-02-15 13:59:38 +01:00
Georgi Gerganov	341bc7d23c	context : fix output reorder with backend sampling (#19638 )	2026-02-15 14:57:40 +02:00
Georgi Gerganov	08e6d914b8	ggml : avoid UB in gemm ukernel (#19642 )	2026-02-15 14:56:35 +02:00

1 2 3 4 5 ...

8108 Commits All Branches Search

8108 Commits

All Branches