llama.cpp/common
James O'Leary 370cdb9f26 grammar : fix lazy trigger crash during generation_prompt prefill
When a lazy grammar trigger pattern matches text in the generation_prompt
(e.g. Functionary v3.2's >>>(?!all) matches >>> at the end of the prompt),
the grammar activates during prefill and crashes with 'Unexpected empty
grammar stack' because the trigger text doesn't match the grammar's
expected start.

Fix: catch the prefill exception, disable grammar, and warn. The model
generates unconstrained but the parser still extracts tool calls. This
is safe because:
- The trigger firing during prefill is a false positive (the trigger text
  is part of the prompt template, not model output)
- Grammar constraints are a generation optimization, not a correctness
  requirement -- the parser handles extraction

An earlier approach changed find_start_pos to not replay trigger text
through the grammar. That broke Nemotron, whose grammar root starts
with the trigger literal (<tool_call>) and needs the replay to advance
past it during generation. The catch approach is correct because it only
affects the prefill path where the trigger fires prematurely, while
leaving the generation-time replay intact.

Verified with Qwen3.5-0.8B + Functionary v3.2 template override:
tools request returns 200 instead of crashing with 400.

Test:

  cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF
  cmake --build build --target test-chat
  ./build/bin/test-chat
2026-03-19 19:58:23 -07:00
..
jinja jinja : add capability check for object args (#20612) 2026-03-16 17:43:14 +01:00
CMakeLists.txt common/parser: handle reasoning budget (#20297) 2026-03-11 10:26:12 +01:00
arg.cpp common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
arg.h vendor : update cpp-httplib to 0.30.0 (#18660) 2026-01-08 13:53:54 +01:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) 2025-06-13 10:38:52 +02:00
chat-auto-parser-generator.cpp chat : handle tool calls with no required args in TAG_WITH_TAGGED format (#20764) 2026-03-19 17:53:11 +01:00
chat-auto-parser-helpers.cpp common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
chat-auto-parser-helpers.h common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
chat-auto-parser.h common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
chat-diff-analyzer.cpp common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
chat-peg-parser.cpp common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
chat-peg-parser.h common/parser: use nlohmann::ordered_json to preserve parameter order (#20385) 2026-03-11 10:26:51 +01:00
chat.cpp grammar : fix lazy trigger crash during generation_prompt prefill 2026-03-19 19:58:23 -07:00
chat.h common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
common.cpp llama : re-enable manual LoRA adapter free (#19983) 2026-03-18 12:03:26 +02:00
common.h common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
console.cpp cli : add command and file auto-completion (#19985) 2026-03-05 10:47:28 +01:00
console.h cli : add command and file auto-completion (#19985) 2026-03-05 10:47:28 +01:00
debug.cpp debug: make common_debug_print_tensor readable (#19331) 2026-02-04 17:55:31 +01:00
debug.h chore : correct typos [no ci] (#20041) 2026-03-05 08:50:21 +01:00
download.cpp build : remove LLAMA_HTTPLIB option (#19623) 2026-02-15 15:38:50 +01:00
download.h preset: allow named remote preset (#18728) 2026-01-10 15:12:29 +01:00
http.h server: Parse port numbers from MCP server URLs in CORS proxy (#20208) 2026-03-09 17:47:54 +01:00
json-partial.cpp common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
json-partial.h cli : fix reasoning responses in CLI (#18961) 2026-01-20 18:23:25 +01:00
json-schema-to-grammar.cpp common : fix incorrect uses of stoul (#20313) 2026-03-10 11:40:26 +01:00
json-schema-to-grammar.h common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
llguidance.cpp sampling : add support for backend sampling (#17004) 2026-01-04 22:22:16 +02:00
log.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
log.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
ngram-cache.cpp spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
ngram-cache.h spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
ngram-map.cpp llama : correct typos 'occured' and 'occurences' (#19414) 2026-02-11 07:05:31 +01:00
ngram-map.h llama : correct typos 'occured' and 'occurences' (#19414) 2026-02-11 07:05:31 +01:00
ngram-mod.cpp spec : add ngram-mod (#19164) 2026-01-30 18:21:48 +02:00
ngram-mod.h ngram-mod : fix build [no ci] (#19216) 2026-01-30 21:27:27 +02:00
peg-parser.cpp common: consolidate PEG string parsers (#20263) 2026-03-10 00:29:21 +01:00
peg-parser.h common: consolidate PEG string parsers (#20263) 2026-03-10 00:29:21 +01:00
preset.cpp preset: allow named remote preset (#18728) 2026-01-10 15:12:29 +01:00
preset.h common: support remote preset (#18520) 2026-01-08 22:35:40 +01:00
reasoning-budget.cpp common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
reasoning-budget.h common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
regex-partial.cpp common : fix iterator::end() dereference (#20445) 2026-03-16 08:50:38 +02:00
regex-partial.h `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
sampling.cpp common/parser: add proper reasoning tag prefill reading (#20424) 2026-03-19 16:58:21 +01:00
sampling.h sampling : add support for backend sampling (#17004) 2026-01-04 22:22:16 +02:00
speculative.cpp spec : remove check rate (#19377) 2026-02-09 15:30:50 +02:00
speculative.h common : add common_speculative_is_compat() (#19270) 2026-02-06 16:47:22 +02:00
unicode.cpp common/parser: handle reasoning budget (#20297) 2026-03-11 10:26:12 +01:00
unicode.h common/parser: handle reasoning budget (#20297) 2026-03-11 10:26:12 +01:00