llama.cpp

History

James O'Leary 370cdb9f26 grammar : fix lazy trigger crash during generation_prompt prefill When a lazy grammar trigger pattern matches text in the generation_prompt (e.g. Functionary v3.2's >>>(?!all) matches >>> at the end of the prompt), the grammar activates during prefill and crashes with 'Unexpected empty grammar stack' because the trigger text doesn't match the grammar's expected start. Fix: catch the prefill exception, disable grammar, and warn. The model generates unconstrained but the parser still extracts tool calls. This is safe because: - The trigger firing during prefill is a false positive (the trigger text is part of the prompt template, not model output) - Grammar constraints are a generation optimization, not a correctness requirement -- the parser handles extraction An earlier approach changed find_start_pos to not replay trigger text through the grammar. That broke Nemotron, whose grammar root starts with the trigger literal (<tool_call>) and needs the replay to advance past it during generation. The catch approach is correct because it only affects the prefill path where the trigger fires prematurely, while leaving the generation-time replay intact. Verified with Qwen3.5-0.8B + Functionary v3.2 template override: tools request returns 200 instead of crashing with 400. Test: cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF cmake --build build --target test-chat ./build/bin/test-chat		2026-03-19 19:58:23 -07:00
..
jinja	jinja : add capability check for object args (#20612 )	2026-03-16 17:43:14 +01:00
CMakeLists.txt	common/parser: handle reasoning budget (#20297 )	2026-03-11 10:26:12 +01:00
arg.cpp	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
arg.h	vendor : update cpp-httplib to 0.30.0 (#18660 )	2026-01-08 13:53:54 +01:00
base64.hpp	llava : expose as a shared library for downstream projects (#3613 )	2023-11-07 00:36:23 +03:00
build-info.cpp.in	cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 )	2025-06-13 10:38:52 +02:00
chat-auto-parser-generator.cpp	chat : handle tool calls with no required args in TAG_WITH_TAGGED format (#20764 )	2026-03-19 17:53:11 +01:00
chat-auto-parser-helpers.cpp	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
chat-auto-parser-helpers.h	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
chat-auto-parser.h	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
chat-diff-analyzer.cpp	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
chat-peg-parser.cpp	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
chat-peg-parser.h	common/parser: use nlohmann::ordered_json to preserve parameter order (#20385 )	2026-03-11 10:26:51 +01:00
chat.cpp	grammar : fix lazy trigger crash during generation_prompt prefill	2026-03-19 19:58:23 -07:00
chat.h	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
common.cpp	llama : re-enable manual LoRA adapter free (#19983 )	2026-03-18 12:03:26 +02:00
common.h	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
console.cpp	cli : add command and file auto-completion (#19985 )	2026-03-05 10:47:28 +01:00
console.h	cli : add command and file auto-completion (#19985 )	2026-03-05 10:47:28 +01:00
debug.cpp	debug: make common_debug_print_tensor readable (#19331 )	2026-02-04 17:55:31 +01:00
debug.h	chore : correct typos [no ci] (#20041 )	2026-03-05 08:50:21 +01:00
download.cpp	build : remove LLAMA_HTTPLIB option (#19623 )	2026-02-15 15:38:50 +01:00
download.h	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
http.h	server: Parse port numbers from MCP server URLs in CORS proxy (#20208 )	2026-03-09 17:47:54 +01:00
json-partial.cpp	common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )	2025-11-18 18:54:15 +01:00
json-partial.h	cli : fix reasoning responses in CLI (#18961 )	2026-01-20 18:23:25 +01:00
json-schema-to-grammar.cpp	common : fix incorrect uses of stoul (#20313 )	2026-03-10 11:40:26 +01:00
json-schema-to-grammar.h	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
llguidance.cpp	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00
log.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
log.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
ngram-cache.cpp	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 )	2026-01-28 19:42:42 +02:00
ngram-cache.h	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 )	2026-01-28 19:42:42 +02:00
ngram-map.cpp	llama : correct typos 'occured' and 'occurences' (#19414 )	2026-02-11 07:05:31 +01:00
ngram-map.h	llama : correct typos 'occured' and 'occurences' (#19414 )	2026-02-11 07:05:31 +01:00
ngram-mod.cpp	spec : add ngram-mod (#19164 )	2026-01-30 18:21:48 +02:00
ngram-mod.h	ngram-mod : fix build [no ci] (#19216 )	2026-01-30 21:27:27 +02:00
peg-parser.cpp	common: consolidate PEG string parsers (#20263 )	2026-03-10 00:29:21 +01:00
peg-parser.h	common: consolidate PEG string parsers (#20263 )	2026-03-10 00:29:21 +01:00
preset.cpp	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
preset.h	common: support remote preset (#18520 )	2026-01-08 22:35:40 +01:00
reasoning-budget.cpp	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
reasoning-budget.h	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
regex-partial.cpp	common : fix iterator::end() dereference (#20445 )	2026-03-16 08:50:38 +02:00
regex-partial.h	`common`: add partial regex support (#12808 )	2025-05-14 19:50:57 +01:00
sampling.cpp	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
sampling.h	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00
speculative.cpp	spec : remove check rate (#19377 )	2026-02-09 15:30:50 +02:00
speculative.h	common : add common_speculative_is_compat() (#19270 )	2026-02-06 16:47:22 +02:00
unicode.cpp	common/parser: handle reasoning budget (#20297 )	2026-03-11 10:26:12 +01:00
unicode.h	common/parser: handle reasoning budget (#20297 )	2026-03-11 10:26:12 +01:00