llama.cpp

Commit Graph

Author	SHA1	Message	Date
Piotr Wilkin	3605e78569	Refactor into class-based approach	2026-02-14 00:17:43 +01:00
Piotr Wilkin	6415d0f03f	Add TODO	2026-02-13 14:42:26 +01:00
Piotr Wilkin	24cc1bcd6d	Clean algorithm for calculate_diff_split; fix buggy expectations	2026-02-13 03:17:20 +01:00
Piotr Wilkin	e772822011	Whitespace	2026-02-13 00:55:56 +01:00
Piotr Wilkin	28fcef67c0	-> Refactor autoparser analyzer structure -> Fix content truncation -> Fix errors in capability detection due to non-empty assistant message -> Add missing debug prints for Jinja	2026-02-13 00:55:35 +01:00
Piotr Wilkin	822fd2bee9	Whoops	2026-02-12 17:22:59 +01:00
Piotr Wilkin	3096ecaa95	One more crazy spacing out	2026-02-11 23:44:52 +01:00
Piotr Wilkin	e40d4cd706	Get rid of some crazy formatting	2026-02-11 22:53:02 +01:00
Piotr Wilkin	56ca124850	Document helpers	2026-02-11 22:42:16 +01:00
Piotr Wilkin	d69ec41ee0	Post-merge adapt	2026-02-11 13:47:30 +01:00
Piotr Wilkin	bd549b3b37	Fix case with object inside object, refactor long methods.	2026-02-11 13:47:29 +01:00
Piotr Wilkin	2081e9b056	Fix number partial parsing issue	2026-02-11 13:47:29 +01:00
Piotr Wilkin	b260de1d86	More edge cases	2026-02-11 13:47:29 +01:00
Piotr Wilkin	60717b3e5a	Fix pesky issue on optional trailing arguments in function calls for TAGGED format	2026-02-11 13:47:29 +01:00
Piotr Wilkin	c2f6fc3a17	Remove [[noreturn]] as it causes compilation problems on Mac.	2026-02-11 13:47:29 +01:00
Piotr Wilkin	f71ae707ba	Fix minor regressions, add [[noreturn]] attrib	2026-02-11 13:47:29 +01:00
Piotr Wilkin	09b447a487	Fix incorrect coercion of strings to non-string types during parsing	2026-02-11 13:47:29 +01:00
Piotr Wilkin	a01e15280a	Feeding the hungry editor checker god.	2026-02-11 13:47:29 +01:00
Piotr Wilkin	384cafc98b	Fix error in argument processing	2026-02-11 13:47:29 +01:00
Piotr Wilkin	3770566c45	Reverd bad change fix some templates and most tests	2026-02-11 13:47:29 +01:00
Piotr Wilkin	9ba9a94819	More robust reasoning detection	2026-02-11 13:47:29 +01:00
Piotr Wilkin	80b7e161ff	Fix reasoning detection	2026-02-11 13:47:29 +01:00
Piotr Wilkin	b0853baca7	Quick vibe-coded fix for proper object printing	2026-02-11 13:47:29 +01:00
Piotr Wilkin	1662fa5bea	ANOTHER GIANT POST-FIXUP SQUISH	2026-02-11 13:47:29 +01:00
Piotr Wilkin	7e6f75a414	THE GIANT AUTOPARSER SQUISH	2026-02-11 13:47:29 +01:00
Piotr Wilkin	571805b348	Make call IDs nine-character	2026-02-11 13:47:29 +01:00
Piotr Wilkin	93f0cc05de	Fix sanitizer warnings	2026-02-11 13:47:29 +01:00
Piotr Wilkin	96316496d5	Fix bad typo	2026-02-11 13:47:29 +01:00
Piotr Wilkin	9a3ac05157	Add workaround for templates requiring non-null content	2026-02-11 13:47:29 +01:00
Adrien Gallouët	0c1f39a9ae	common : improve download error reporting (#19491 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-11 09:27:55 +01:00
thecaptain789	8ee538ce73	llama : correct typos 'occured' and 'occurences' (#19414 ) Co-authored-by: thecaptain789 <thecaptain789@users.noreply.github.com>	2026-02-11 07:05:31 +01:00
Xuan-Son Nguyen	98e57ca422	chat: fix case where template accepts type content only (#19419 ) * chat: fix case where template accepts type content only * rm stray log * reuse render_message_to_json	2026-02-09 22:14:12 +01:00
Sascha Rogmann	292f6908cd	spec : remove check rate (#19377 ) * spec: remove parameter spec-ngram-check-rate * spec : renamed statistics vars * spec : add n_call_begin, n_call_accept * spec : don't enable key-map-stats	2026-02-09 15:30:50 +02:00
Georgi Gerganov	dfde5993ea	common : add common_speculative_is_compat() (#19270 ) * llama : add llama_memory_can_rm_suffix() * Revert "llama : add llama_memory_can_rm_suffix()" This reverts commit `d30e59b62a`. * spec : check if the target context is compatible for spec decoding	2026-02-06 16:47:22 +02:00
Xuan-Son Nguyen	e0c93af2a0	debug: make common_debug_print_tensor readable (#19331 ) * debug: make common_debug_print_tensor readable * editorconfig	2026-02-04 17:55:31 +01:00
Georgi Gerganov	d838c22bb3	spec : fix the check-rate logic of ngram-simple (#19261 ) * spec : fix the check-rate logic of ngram-simple * cont : refactor + fix checks	2026-02-04 10:39:53 +02:00
Georgi Gerganov	aeb827a3cc	spec : simplify time measurement using common_time_meas (#19262 )	2026-02-03 08:20:15 +02:00
Sid Mohan	0dfcd3b607	jinja : add missing 'in' test to template engine (#19004 ) (#19239 ) * jinja : add missing 'in' test to template engine (#19004) The jinja template parser was missing the 'in' test from global_builtins(), causing templates using reject("in", ...), select("in", ...), or 'x is in(y)' to fail with "selectattr: unknown test 'in'". This broke tool-calling for Qwen3-Coder and any other model whose chat template uses the 'in' test. Added test_is_in supporting array, string, and object containment checks, mirroring the existing 'in' operator logic in runtime.cpp. Includes test cases for all three containment types plus reject/select filter usage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * reuse test_is_in in binary op --------- Co-authored-by: Sid Mohan <sidmohan0@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-02-02 21:00:55 +01:00
Sascha Rogmann	b4d05a3d2f	spec : various improvements ton ngram-map + docs (#19253 ) * spec: ngram-map and reasoning chats * spec: add t_begin and t_accept * ngram-map : add internal hash map * docs : update ngram-map, add ngram-mod * docs : fix ngram-map-k * docs : differences between implementations	2026-02-02 08:26:58 +02:00
Georgi Gerganov	4927795810	ngram-mod : fix build [no ci] (#19216 )	2026-01-30 21:27:27 +02:00
Georgi Gerganov	dabaa2e77a	spec : add ngram-mod (#19164 ) * spec : add ngram-mod * cont : simplify + keep track of occupancy * cont : cleanup * cont : move initialization to common/speculative * cont : cleanup * cont : cleanup * cont : fix	2026-01-30 18:21:48 +02:00
Marcello Seri	2e916f996a	jinja : add unordered_map include to value.h [no ci] (#19205 ) On macos Sequoia 15.7.3, x86_64, the build has recently started failing with ``` In file included from .../code/cpp/llama.cpp/common/jinja/string.cpp:2: .../code/cpp/llama.cpp/common/./jinja/value.h:478:10: error: no template named 'unordered_map' in namespace 'std' 478 \| std::unordered_map<value, value, value_hasher, value_equivalence> unordered; \| ~~~~~^ In file included from .../code/cpp/llama.cpp/common/jinja/caps.cpp:1: .../code/cpp/llama.cpp/common/jinja/value.h:478:10: error: no template named 'unordered_map' in namespace 'std' 478 \| std::unordered_map<value, value, value_hasher, value_equivalence> unordered; \| ~~~~~^ In file included from .../code/cpp/llama.cpp/common/jinja/value.cpp:1: In file included from .../code/cpp/llama.cpp/common/jinja/runtime.h:4: .../code/cpp/llama.cpp/common/jinja/value.h:478:10: error: no template named 'unordered_map' in namespace 'std' 478 \| std::unordered_map<value, value, value_hasher, value_equivalence> unordered; [...] ``` After a bit of digging to make sure all the appropriate flags were used, I notifced that the necessary header was not included. This fixes the build for me and should not affect negatively other builds that for some reasons were already succeeding	2026-01-30 16:09:44 +01:00
Aldehir Rojas	7b7ae857f6	chat : add parsing for solar-open-100b (#18540 ) * chat : add parsing for solar-open-100b * add comments to rules * cont : make assistant start optional * cont : remove assistant start prefix altogether --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>	2026-01-29 16:06:15 +01:00
Sigbjørn Skjæret	b45ef2702c	jinja : do not pass empty tools and add some none filters (#19176 )	2026-01-29 14:06:54 +01:00
Georgi Gerganov	eed25bc6b0	arg : add -kvu to llama-batched-bench (#19172 )	2026-01-29 08:50:47 +02:00
Sascha Rogmann	72d3b1898a	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 ) * server: introduce self-speculative decoding * server: moved self-call into speculative.cpp * can_speculate() includes self-speculation Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: can_speculate() tests self-spec * server: replace can_speculate() with slot.can_speculate() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * common: use %zu format specifier for size_t in logging Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * server: can_speculate() requires a task instance * common: ngram map, config self-speculative decoding * common: add enum common_speculative_type * common: add vector of speculative states * common: add option --spec-draftless * server: cleanup (remove slot.batch_spec, rename) * common: moved self-spec impl to ngram-map * common: cleanup (use common_speculative_state_draft) * spec : refactor * cont : naming * spec: remove --spec-config * doc: (draftless) speculative decoding * common: print performance in spec decoding * minor : cleanup * common : better names * minor : cleanup + fix build * minor: comments * CODEOWNERS: add common/ngram-map.* (#18471) * common : rename speculative.draftless_type -> speculative.type * ngram-map : fix uninitialized values * ngram-map : take into account the input can become shorter * ngram-map : revert len check for now * arg : change `--spec-draftless` -> `--spec-type` * spec : add common_speculative_state::accept() * spec : refactor + add common_speculative_begin() * spec : fix begin() call with mtmd * spec : additional refactor + remove common_speculative_params --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-28 19:42:42 +02:00
Sigbjørn Skjæret	60368e1d73	jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests (#19147 ) * undefined is treated as iterable (string/array) by filters `tojson` is not a supported `undefined` filter * add tests * add sequence and iterable tests keep it DRY and fix some types	2026-01-28 14:40:29 +01:00
Georgi Gerganov	631cbfcc7a	cuda : fix "V is K view" check for non-unified KV cache (#19145 )	2026-01-28 09:15:27 +02:00
Georgi Gerganov	c5c64f72ac	llama : disable Direct IO by default (#19109 ) * llama : disable Direct IO by default * cont : override mmap if supported	2026-01-28 09:11:13 +02:00
Sigbjørn Skjæret	2b4cbd2834	jinja : implement mixed type object keys (#18955 ) * implement mixed type object keys * add tests * refactor * minor fixes * massive refactor * add more tests * forgotten tuples * fix array/object is_hashable * correct (albeit broken) jinja responses verified with transformers * improved hashing and equality * refactor hash function * more exhausive test case * clean up * cont * cont (2) * missing cstring --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-01-27 19:50:42 +01:00

1 2 3 4 5 ...

756 Commits