Commit Graph

756 Commits

Author SHA1 Message Date
Piotr Wilkin 3605e78569 Refactor into class-based approach 2026-02-14 00:17:43 +01:00
Piotr Wilkin 6415d0f03f Add TODO 2026-02-13 14:42:26 +01:00
Piotr Wilkin 24cc1bcd6d Clean algorithm for calculate_diff_split; fix buggy expectations 2026-02-13 03:17:20 +01:00
Piotr Wilkin e772822011 Whitespace 2026-02-13 00:55:56 +01:00
Piotr Wilkin 28fcef67c0 -> Refactor autoparser analyzer structure
-> Fix content truncation
-> Fix errors in capability detection due to non-empty assistant message
-> Add missing debug prints for Jinja
2026-02-13 00:55:35 +01:00
Piotr Wilkin 822fd2bee9 Whoops 2026-02-12 17:22:59 +01:00
Piotr Wilkin 3096ecaa95 One more crazy spacing out 2026-02-11 23:44:52 +01:00
Piotr Wilkin e40d4cd706 Get rid of some crazy formatting 2026-02-11 22:53:02 +01:00
Piotr Wilkin 56ca124850 Document helpers 2026-02-11 22:42:16 +01:00
Piotr Wilkin d69ec41ee0 Post-merge adapt 2026-02-11 13:47:30 +01:00
Piotr Wilkin bd549b3b37 Fix case with object inside object, refactor long methods. 2026-02-11 13:47:29 +01:00
Piotr Wilkin 2081e9b056 Fix number partial parsing issue 2026-02-11 13:47:29 +01:00
Piotr Wilkin b260de1d86 More edge cases 2026-02-11 13:47:29 +01:00
Piotr Wilkin 60717b3e5a Fix pesky issue on optional trailing arguments in function calls for TAGGED format 2026-02-11 13:47:29 +01:00
Piotr Wilkin c2f6fc3a17 Remove [[noreturn]] as it causes compilation problems on Mac. 2026-02-11 13:47:29 +01:00
Piotr Wilkin f71ae707ba Fix minor regressions, add [[noreturn]] attrib 2026-02-11 13:47:29 +01:00
Piotr Wilkin 09b447a487 Fix incorrect coercion of strings to non-string types during parsing 2026-02-11 13:47:29 +01:00
Piotr Wilkin a01e15280a Feeding the hungry editor checker god. 2026-02-11 13:47:29 +01:00
Piotr Wilkin 384cafc98b Fix error in argument processing 2026-02-11 13:47:29 +01:00
Piotr Wilkin 3770566c45 Reverd bad change fix some templates and most tests 2026-02-11 13:47:29 +01:00
Piotr Wilkin 9ba9a94819 More robust reasoning detection 2026-02-11 13:47:29 +01:00
Piotr Wilkin 80b7e161ff Fix reasoning detection 2026-02-11 13:47:29 +01:00
Piotr Wilkin b0853baca7 Quick vibe-coded fix for proper object printing 2026-02-11 13:47:29 +01:00
Piotr Wilkin 1662fa5bea ANOTHER GIANT POST-FIXUP SQUISH 2026-02-11 13:47:29 +01:00
Piotr Wilkin 7e6f75a414 THE GIANT AUTOPARSER SQUISH 2026-02-11 13:47:29 +01:00
Piotr Wilkin 571805b348 Make call IDs nine-character 2026-02-11 13:47:29 +01:00
Piotr Wilkin 93f0cc05de Fix sanitizer warnings 2026-02-11 13:47:29 +01:00
Piotr Wilkin 96316496d5 Fix bad typo 2026-02-11 13:47:29 +01:00
Piotr Wilkin 9a3ac05157 Add workaround for templates requiring non-null content 2026-02-11 13:47:29 +01:00
Adrien Gallouët 0c1f39a9ae
common : improve download error reporting (#19491)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-11 09:27:55 +01:00
thecaptain789 8ee538ce73
llama : correct typos 'occured' and 'occurences' (#19414)
Co-authored-by: thecaptain789 <thecaptain789@users.noreply.github.com>
2026-02-11 07:05:31 +01:00
Xuan-Son Nguyen 98e57ca422
chat: fix case where template accepts type content only (#19419)
* chat: fix case where template accepts type content only

* rm stray log

* reuse render_message_to_json
2026-02-09 22:14:12 +01:00
Sascha Rogmann 292f6908cd
spec : remove check rate (#19377)
* spec: remove parameter spec-ngram-check-rate

* spec : renamed statistics vars

* spec : add n_call_begin, n_call_accept

* spec : don't enable key-map-stats
2026-02-09 15:30:50 +02:00
Georgi Gerganov dfde5993ea
common : add common_speculative_is_compat() (#19270)
* llama : add llama_memory_can_rm_suffix()

* Revert "llama : add llama_memory_can_rm_suffix()"

This reverts commit d30e59b62a.

* spec : check if the target context is compatible for spec decoding
2026-02-06 16:47:22 +02:00
Xuan-Son Nguyen e0c93af2a0
debug: make common_debug_print_tensor readable (#19331)
* debug: make common_debug_print_tensor readable

* editorconfig
2026-02-04 17:55:31 +01:00
Georgi Gerganov d838c22bb3
spec : fix the check-rate logic of ngram-simple (#19261)
* spec : fix the check-rate logic of ngram-simple

* cont : refactor + fix checks
2026-02-04 10:39:53 +02:00
Georgi Gerganov aeb827a3cc
spec : simplify time measurement using common_time_meas (#19262) 2026-02-03 08:20:15 +02:00
Sid Mohan 0dfcd3b607
jinja : add missing 'in' test to template engine (#19004) (#19239)
* jinja : add missing 'in' test to template engine (#19004)

The jinja template parser was missing the 'in' test from
global_builtins(), causing templates using reject("in", ...),
select("in", ...), or 'x is in(y)' to fail with
"selectattr: unknown test 'in'".

This broke tool-calling for Qwen3-Coder and any other model
whose chat template uses the 'in' test.

Added test_is_in supporting array, string, and object containment
checks, mirroring the existing 'in' operator logic in runtime.cpp.

Includes test cases for all three containment types plus
reject/select filter usage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* reuse test_is_in in binary op

---------

Co-authored-by: Sid Mohan <sidmohan0@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-02-02 21:00:55 +01:00
Sascha Rogmann b4d05a3d2f
spec : various improvements ton ngram-map + docs (#19253)
* spec: ngram-map and reasoning chats

* spec: add t_begin and t_accept

* ngram-map : add internal hash map

* docs : update ngram-map, add ngram-mod

* docs : fix ngram-map-k

* docs : differences between implementations
2026-02-02 08:26:58 +02:00
Georgi Gerganov 4927795810
ngram-mod : fix build [no ci] (#19216) 2026-01-30 21:27:27 +02:00
Georgi Gerganov dabaa2e77a
spec : add ngram-mod (#19164)
* spec : add ngram-mod

* cont : simplify + keep track of occupancy

* cont : cleanup

* cont : move initialization to common/speculative

* cont : cleanup

* cont : cleanup

* cont : fix
2026-01-30 18:21:48 +02:00
Marcello Seri 2e916f996a
jinja : add unordered_map include to value.h [no ci] (#19205)
On macos Sequoia 15.7.3, x86_64, the build has recently started failing with
```
In file included from .../code/cpp/llama.cpp/common/jinja/string.cpp:2:
.../code/cpp/llama.cpp/common/./jinja/value.h:478:10: error: no template named 'unordered_map' in namespace 'std'
  478 |     std::unordered_map<value, value, value_hasher, value_equivalence> unordered;
      |     ~~~~~^
In file included from .../code/cpp/llama.cpp/common/jinja/caps.cpp:1:
.../code/cpp/llama.cpp/common/jinja/value.h:478:10: error: no template named 'unordered_map' in namespace 'std'
  478 |     std::unordered_map<value, value, value_hasher, value_equivalence> unordered;
      |     ~~~~~^
In file included from .../code/cpp/llama.cpp/common/jinja/value.cpp:1:
In file included from .../code/cpp/llama.cpp/common/jinja/runtime.h:4:
.../code/cpp/llama.cpp/common/jinja/value.h:478:10: error: no template named 'unordered_map' in namespace 'std'
  478 |     std::unordered_map<value, value, value_hasher, value_equivalence> unordered;
[...]
```

After a bit of digging to make sure all the appropriate flags were used, I notifced that the necessary header was not included. This fixes the build for me and should not affect negatively other builds that for some reasons were already succeeding
2026-01-30 16:09:44 +01:00
Aldehir Rojas 7b7ae857f6
chat : add parsing for solar-open-100b (#18540)
* chat : add parsing for solar-open-100b

* add comments to rules

* cont : make assistant start optional

* cont : remove assistant start prefix altogether

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-01-29 16:06:15 +01:00
Sigbjørn Skjæret b45ef2702c
jinja : do not pass empty tools and add some none filters (#19176) 2026-01-29 14:06:54 +01:00
Georgi Gerganov eed25bc6b0
arg : add -kvu to llama-batched-bench (#19172) 2026-01-29 08:50:47 +02:00
Sascha Rogmann 72d3b1898a
spec : add self‑speculative decoding (no draft model required) + refactor (#18471)
* server: introduce self-speculative decoding

* server: moved self-call into speculative.cpp

* can_speculate() includes self-speculation

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server: can_speculate() tests self-spec

* server: replace can_speculate() with slot.can_speculate()

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* common: use %zu format specifier for size_t in logging

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* server: can_speculate() requires a task instance

* common: ngram map, config self-speculative decoding

* common: add enum common_speculative_type

* common: add vector of speculative states

* common: add option --spec-draftless

* server: cleanup (remove slot.batch_spec, rename)

* common: moved self-spec impl to ngram-map

* common: cleanup (use common_speculative_state_draft)

* spec : refactor

* cont : naming

* spec: remove --spec-config

* doc: (draftless) speculative decoding

* common: print performance in spec decoding

* minor : cleanup

* common : better names

* minor : cleanup + fix build

* minor: comments

* CODEOWNERS: add common/ngram-map.* (#18471)

* common : rename speculative.draftless_type -> speculative.type

* ngram-map : fix uninitialized values

* ngram-map : take into account the input can become shorter

* ngram-map : revert len check for now

* arg : change `--spec-draftless` -> `--spec-type`

* spec : add common_speculative_state::accept()

* spec : refactor + add common_speculative_begin()

* spec : fix begin() call with mtmd

* spec : additional refactor + remove common_speculative_params

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-28 19:42:42 +02:00
Sigbjørn Skjæret 60368e1d73
jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests (#19147)
* undefined is treated as iterable (string/array) by filters

`tojson` is not a supported `undefined` filter

* add tests

* add sequence and iterable tests

keep it DRY and fix some types
2026-01-28 14:40:29 +01:00
Georgi Gerganov 631cbfcc7a
cuda : fix "V is K view" check for non-unified KV cache (#19145) 2026-01-28 09:15:27 +02:00
Georgi Gerganov c5c64f72ac
llama : disable Direct IO by default (#19109)
* llama : disable Direct IO by default

* cont : override mmap if supported
2026-01-28 09:11:13 +02:00
Sigbjørn Skjæret 2b4cbd2834
jinja : implement mixed type object keys (#18955)
* implement mixed type object keys

* add tests

* refactor

* minor fixes

* massive refactor

* add more tests

* forgotten tuples

* fix array/object is_hashable

* correct (albeit broken) jinja responses

verified with transformers

* improved hashing and equality

* refactor hash function

* more exhausive test case

* clean up

* cont

* cont (2)

* missing cstring

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-01-27 19:50:42 +01:00