* grammar : add support for std::regex_search() with trigger patterns
* common : update hermes2 pro trigger to search instead of match
* common : use regex_search with anchoring for partial matching
* common : adjust regex partial tests to use new pattern
* grammar : check pattern directly instead of adding a type
* common : adjust existing patterns to match new semantics
The change introduces a maximum repetition threshold to avoid
excessive rule expansion during grammar parsing. When parsing
repetition patterns like {m,n}, the parser now calculates the
potential number of rules that would be generated and throws an error
if the product of previous rules and new rules exceeds the threshold.
A test case was added to verify the threshold is properly enforced for
deeply nested repetition patterns that would otherwise cause hangs.
This change converts the function to an iterative approach using
explicit stacks, which prevents deep recursion and eliminates the risk
of stack overflow.
rg-edit regexp: llama_grammar_advance_stack
rg-edit extra-args: -A30
rg-edit directive: """Rewrite: fix the following segfault:
[..]
⚫ Testing segfault. Grammar:
root ::= ( [x]* )*
root ::= ( [x]* )*
Segmentation fault build/bin/test-grammar-integration
convert from recursive to interactive"""
gptel-context Value:
(("~/devel/ai/llama.cpp/src/llama-grammar.cpp")
("~/devel/ai/llama.cpp/tests/test-grammar-integration.cpp")
("~/devel/ai/llama.cpp/grammars/./list.gbnf")
("~/devel/ai/llama.cpp/grammars/./json_arr.gbnf")
("~/devel/ai/llama.cpp/grammars/./json.gbnf")
("~/devel/ai/llama.cpp/grammars/./japanese.gbnf")
("~/devel/ai/llama.cpp/grammars/./english.gbnf")
("~/devel/ai/llama.cpp/grammars/./chess.gbnf")
("~/devel/ai/llama.cpp/grammars/./c.gbnf")
("~/devel/ai/llama.cpp/grammars/./arithmetic.gbnf")
("~/devel/ai/llama.cpp/grammars/./README.md"))
* Fix DoS / integer overflow
* Remove optional, use INT64_MAX instead as placeholder value (it's technically -1, so it fits :)
* White space
* Actually, since it's unsigned, use UINT64_MAX
* sampler: turn lazy grammar trigger words to regexes
* add scripts/tool_bench.sh & .py
* constrain llama json output regardless of function name if matches at beginning
* update relaxed newline space rule in grammar tests
* support add_generation_prompt query parameter (useful for /apply_template)
* Update src/llama-grammar.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit adjusts the indentation for the functions `parse_sequence`
and `parse_rule` in src/llama-grammar.cpp.
The motivation is consistency and improve readability.
* extract & return thoughts in reasoning_content field (unless --reasoning-format) for DeepSeek R1 & Command R7B
* tool-calls: add deepseek r1 template (models/templates/llama-cpp-deepseek-r1.jinja) + hackommodate broken official template
* tool-calls: accommodate variety of wrong tool call opening tags both R1 Qwen 32B and 7B distills like to spit out
* server/oai: ensure content is null when there are tool calls, and reasoning_content appears before content for readability
* tool-calls: add DeepSeek R1 Qwen distills to server/README.md & server tests
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
- Add `struct llama_sampler` and `struct llama_sampler_i`
- Add `llama_sampler_` API
- Add `llama_sampler_chain_` API for chaining multiple samplers
- Remove `LLAMA_API_INTERNAL`
- Add `llama_perf_` API and remove old `llama_print_timings` and `llama_reset_timings`