Adrien Gallouët
fa6542d12b
Merge 3645fee1ed into 49bfddeca1
2026-03-22 20:58:04 +01:00
Adrien Gallouët
3645fee1ed
Check all inputs
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-22 19:08:26 +00:00
Sigbjørn Skjæret
23c9182ce8
jinja : refactor token advancement ( #20864 )
...
* refactor token advancement
* exercise sub-expressions
2026-03-22 17:45:10 +01:00
Adrien Gallouët
6ab630f5f8
Use final_path..
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-22 09:34:26 +00:00
Adrien Gallouët
74c1874072
Use cached files when HF API fails
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-22 09:30:30 +00:00
Adrien Gallouët
5572986afa
Prefer main when getting cached ref
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-22 09:18:48 +00:00
Adrien Gallouët
77fa9a9990
Restore common_cached_model_info and align mmproj filtering
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-22 09:04:08 +00:00
ddh0
3306dbaef7
misc : prefer ggml-org models in docs and examples ( #20827 )
...
* misc : prefer ggml-org models in docs and examples
Prefer referring to known-good quantizations under ggml-org rather than
3rd-party uploaders.
* remove accidentally committed file
2026-03-21 22:00:26 +01:00
Piotr Wilkin (ilintar)
b1c70e2e54
common/parser: fix nasty bug causing subtle corruption of generation prompt ( #20825 )
2026-03-21 00:19:04 +01:00
Adrien Gallouët
e404f6ab1c
Improve error handling and report API errors
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-20 22:15:55 +00:00
Adrien Gallouët
b6c7bcfa62
Cleanup
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-20 18:06:33 +00:00
James O'Leary
149b2493c0
common : fix typo in debug log ('extracft' -> 'extract') ( #20807 )
2026-03-20 18:23:18 +01:00
Adrien Gallouët
6fd16ba05c
common : add standard Hugging Face cache support
...
- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-20 14:11:20 +00:00
Ruikai Peng
21c8045214
jinja : fix heap OOB read in value equality comparison ( #20782 )
...
Address GHSA-q9j6-4hhc-rq9p and GHSA-2q4c-9gq5-5vfp.
The three-iterator overload of std::equal in value_array_t::equivalent()
and value_object_t::equivalent() reads past the end of the shorter
container when comparing arrays or objects of different lengths.
Use the four-iterator overload (C++14) which checks both range lengths.
Found-by: Pwno
2026-03-20 07:15:17 +01:00
James O'Leary
c46583b86b
common/parser : fix out_of_range crash in throw path ( #20424 regression) ( #20777 )
...
* chat : fix out_of_range crash in throw path (#20424 regression)
#20424 introduced effective_input = generation_prompt + input, but the
throw path uses input.substr(result.end) where result.end is a position
within effective_input. Every thinking model with a non-empty
generation_prompt crashes with std::out_of_range instead of the intended
error message.
Test crashes on unpatched master, passes with fix:
cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
cmake --build build --target test-chat
./build/bin/test-chat
* Update test-chat.cpp
* Update test-chat.cpp
* Update test-chat.cpp
---------
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-03-20 02:37:22 +01:00
James O'Leary
76f2dc70c3
chat : handle tool calls with no required args in TAG_WITH_TAGGED format ( #20764 )
...
* chat : handle tool calls with no required args in TAG_WITH_TAGGED format
* Update tests/test-chat.cpp [no ci]
Co-authored-by: Aldehir Rojas <hello@alde.dev>
---------
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
Co-authored-by: Aldehir Rojas <hello@alde.dev>
2026-03-19 17:53:11 +01:00
Piotr Wilkin (ilintar)
5e54d51b19
common/parser: add proper reasoning tag prefill reading ( #20424 )
...
* Implement proper prefill extraction
* Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp
* Update tools/server/server-task.cpp
* refactor: move grammars to variant, remove grammar_external, handle exception internally
* Make code less C++y
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-19 16:58:21 +01:00
ddh0
922b90e567
common : add LLAMA_ARG_SPEC_TYPE ( #20744 )
2026-03-19 16:16:55 +01:00
Aldehir Rojas
1b9bbaa357
common : fix gpt-oss content removal ( #20745 )
2026-03-19 11:40:39 +01:00
Pop Flamingo
312cf03328
llama : re-enable manual LoRA adapter free ( #19983 )
...
* Re-enable manual LoRA adapter free
* Remove stale "all adapters must be loaded before context creation" stale comments
2026-03-18 12:03:26 +02:00
Aldehir Rojas
5e8910a0db
common : rework gpt-oss parser ( #20393 )
...
* common : rework gpt-oss parser
* cont : fix gpt-oss tests
* cont : add structured output test
* cont : rename final to final_msg
2026-03-18 10:41:25 +01:00
Piotr Wilkin (ilintar)
d2ecd2d1cf
common/parser: add `--skip-chat-parsing` to force a pure content parser. ( #20289 )
...
* Add `--force-pure-content` to force a pure content parser.
* Update common/arg.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Change parameter name [no ci]
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 16:16:43 +01:00
Aldehir Rojas
1bbec6a75d
jinja : add capability check for object args ( #20612 )
2026-03-16 17:43:14 +01:00
Masato Nakasaka
d3936498a3
common : fix iterator::end() dereference ( #20445 )
2026-03-16 08:50:38 +02:00
Eric Hsieh
559646472d
fix: prevent nullptr dereference ( #20552 )
2026-03-15 16:51:49 +01:00
Piotr Wilkin (ilintar)
1430c35948
common/parser: gracefully handle undetected tool parser, print error message. ( #20286 )
2026-03-13 20:56:10 +01:00
Ruben Ortlam
128142fe7d
test-backend-ops: allow loading tests from file and parsing model operators into file ( #19896 )
...
* tests: allow loading test-backend-ops tests from json
* add error threshold based on op
* add error when file cannot be read
* add graph operator json extraction tool
* add nb parameter for non-contiguous input tensors
* fix view check
* only use view if non-contiguous/permuted, use C++ random instead of rand()
* replace internal API calls with public llama_graph_reserve call
* reduce test description length
* fix nb[0] not getting set for view
* add name to tests
* fix inplace error
* use text file instead of json
* move llama_graph_reserve function to new llama-ext header, move export-graph-ops to tests/
* fix missing declaration
* use pragma once
* fix indent
* fix Windows build
2026-03-12 13:26:00 +01:00
Daniel Bevenius
6de1bc631d
common : update completion executables list [no ci] ( #19934 )
...
This commit updates the bash completion executables list, adding missing
executables and removing some that non longer exist.
2026-03-12 12:12:01 +01:00
Mishusha
a8304b4d27
common/parser: add GigaChatV3/3.1 models support ( #19931 )
...
Co-authored-by: Mishusha <pmv26021975@gmail.com>
2026-03-12 01:22:25 +01:00
ddh0
4a748b8f15
common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up ( #20416 )
2026-03-12 00:13:28 +01:00
Aldehir Rojas
b5fe4559ae
common/parser: use nlohmann::ordered_json to preserve parameter order ( #20385 )
2026-03-11 10:26:51 +01:00
Piotr Wilkin (ilintar)
acb7c79069
common/parser: handle reasoning budget ( #20297 )
...
* v1
* Finished!
* Handlie cli
* Reasoning sampler
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Less explosive terminology :)
* Add utf-8 case and tests
* common : migrate reasoning budget sampler to common
* cont : clean up
* cont : expose state and allow passing as initial state
* cont : remove unused imports
* cont : update state machine doc string
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Alde Rojas <hello@alde.dev>
2026-03-11 10:26:12 +01:00
Piotr Wilkin (ilintar)
6c770d16ca
Reduce level of content parser warning message to avoid log spam on non-debug verbosity ( #20347 )
2026-03-10 15:21:51 +01:00
Sigbjørn Skjæret
ec947d2b16
common : fix incorrect uses of stoul ( #20313 )
2026-03-10 11:40:26 +01:00
Aldehir Rojas
c96f608d98
common: consolidate PEG string parsers ( #20263 )
...
* common : consolidate PEG string parsers
* cont : fix json_string_content()
2026-03-10 00:29:21 +01:00
Evan Huus
23fbfcb1ad
server: Parse port numbers from MCP server URLs in CORS proxy ( #20208 )
...
* Parse port numbers from MCP server URLs
* Pass scheme to http proxy for determining whether to use SSL
* Fix download on non-standard port and re-add port to logging
* add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-03-09 17:47:54 +01:00
Piotr Wilkin (ilintar)
f76565db92
common: map developer role to system ( #20215 )
...
* Map developer role to system
* Simplify
2026-03-09 14:25:11 +01:00
Piotr Wilkin (ilintar)
97c64fbdbd
PEG parser for LFM2 ( #20251 )
...
* PEG parser for LFM2
* Simplify using python_value()
2026-03-09 01:11:22 +01:00
Aldehir Rojas
451ef08432
common : gracefully handle incomplete output ( #20191 )
...
* common : handle incomplete UTF-8 at end of input in PEG parser
* cont : if reached end prematurely, emit needs_more_input to propagate partial output
* cont: refactor peg parse context to add lenient flag
* cont : remove partial flag, keep lenient flag
2026-03-08 17:17:02 +01:00
Piotr Wilkin (ilintar)
9b24886f78
Fix compile bug ( #20203 )
...
* Fix compile bug
* Update common/chat-auto-parser-helpers.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-08 17:15:49 +01:00
Piotr Wilkin (ilintar)
62b8143ad2
Fix structured outputs ( #20223 )
...
* Fix structured outputs
* Update common/chat-auto-parser-generator.cpp
Co-authored-by: Aldehir Rojas <hello@alde.dev>
---------
Co-authored-by: Aldehir Rojas <hello@alde.dev>
2026-03-08 17:14:43 +01:00
Johannes Gäßler
a976ff081b
llama: end-to-end tests ( #19802 )
...
* tests: add end-to-end tests per model architecture
* fixup for rebase
* fix use-after-free in llama-model-loader.cpp
* fix CI
* fix WebGPU
* fix CI
* disable CI for macOS-latest-cmake-arm64
* use expert_weights_scale only if != 0.0f
* comments
2026-03-08 12:30:21 +01:00
Piotr Wilkin (ilintar)
b283f6d5b3
Revert to OAI-compatible args ( #20213 )
...
* Revert to OAI-compatible args
* Apply workaround::func_args_not_string
2026-03-08 11:33:03 +01:00
Piotr Wilkin (ilintar)
c024d85908
Autoparser: True streaming ( #20177 )
...
* Relax atomicity constraint for nicer, more pleasent, True Streaming parsing
* Whitespace
* Remove redundant atomics
2026-03-07 01:55:33 +01:00
Piotr Wilkin (ilintar)
2f2923f895
Autoparser: add optional argument reshuffle capability ( #20171 )
...
* Allow reshuffled arguments in tagged argument parser format tool calls.
* Remove shuffle just keep the optional parsers in any order
* Remove unnecessary import
2026-03-06 22:34:15 +01:00
Piotr Wilkin (ilintar)
566059a26b
Autoparser - complete refactoring of parser architecture ( #18675 )
...
* Autoparser - full single commit squish
* Final pre-merge changes: minor fixes, Kimi 2.5 model parser
2026-03-06 21:01:00 +01:00
Piotr Wilkin (ilintar)
f5ddcd1696
Checkpoint every n tokens: squash ( #20087 )
2026-03-06 11:39:26 +01:00
Aleksander Grygier
f6235a41ef
webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts ( #18655 )
2026-03-06 10:00:39 +01:00
Sigbjørn Skjæret
b5ed0e058c
cli : add command and file auto-completion ( #19985 )
2026-03-05 10:47:28 +01:00
Marcel Petrick
92f7da00b4
chore : correct typos [no ci] ( #20041 )
...
* fix(docs): correct typos found during code review
Non-functional changes only:
- Fixed minor spelling mistakes in comments
- Corrected typos in user-facing strings
- No variables, logic, or functional code was modified.
Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
* Update docs/backend/CANN.md
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
* Revert "Auxiliary commit to revert individual files from 846d1c301281178efbc6ce6060ad34c1ebe45af8"
This reverts commit 02fcf0c7db661d5ff3eff96b2b2db9fdb7213256.
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-05 08:50:21 +01:00