21 KiB
Unified Auto-Parser Architecture
The auto-parser automatically analyzes chat templates to determine how to parse model outputs, including content, reasoning, and tool calls.
Overview
The unified auto-parser uses a two-phase incremental analysis approach:
- Phase 1: Content & Reasoning Analysis - Analyzes how the template handles basic content and reasoning, without considering tools
- Phase 2: Tool Call Analysis - Analyzes tool calling patterns, layered on top of Phase 1
Data Structures
content_structure (Phase 1 Result)
Describes how the template handles content and reasoning:
struct content_structure {
enum reasoning_mode_type {
REASONING_NONE, // No reasoning markers detected
REASONING_OPTIONAL, // <think>...</think> may appear before content
REASONING_FORCED_OPEN, // Template ends with open reasoning tag OR starts implicitly (empty start, present end)
};
reasoning_mode_type reasoning_mode = REASONING_NONE;
std::string reasoning_start; // e.g., "<think>", "<|START_THINKING|>"
std::string reasoning_end; // e.g., "</think>", "<|END_THINKING|>"
// Content wrapping mode
enum content_mode_type {
CONTENT_PLAIN, // No content markers
CONTENT_ALWAYS_WRAPPED, // <response>...</response> always present
CONTENT_WRAPPED_WITH_REASONING, // Content wrapped only when reasoning present
};
content_mode_type content_mode = CONTENT_PLAIN;
std::string content_start; // e.g., "<response>", "<|START_RESPONSE|>"
std::string content_end; // e.g., "</response>", "<|END_RESPONSE|>"
};
tool_call_structure (Phase 2 Result)
Describes how the template formats tool calls:
struct tool_call_structure {
bool supports_tools = false;
// Container markers (what wraps all tool calls)
std::string tool_section_start; // e.g., "<tool_call>", "[TOOL_CALLS]", "<TOOLCALL>", ""
std::string tool_section_end; // e.g., "</tool_call>", "]", "</TOOLCALL>", ""
// Function format (how individual functions are structured)
enum function_format {
FUNC_JSON_OBJECT, // {"name": "X", "arguments": {...}}
FUNC_TAG_WITH_NAME, // <function=X>{...}</function>
FUNC_TAG_NAME_ONLY, // <X>...</X> where X is function name (rare)
FUNC_PREFIXED_INDEXED, // <|tool_call_begin|>functions.X:0<|tool_call_argument_begin|>{...}<|tool_call_end|>
FUNC_NAME_AS_KEY, // [{"function_name": {...arguments...}}] (Apertus-style)
FUNC_BRACKET_TAG, // [TOOL_CALLS]X[CALL_ID]id[ARGS]{...} (Mistral Small 3.2 style)
FUNC_RECIPIENT_BASED, // >>>recipient\n{content} where recipient is "all" (content) or function name (tools)
FUNC_MARKDOWN_CODE_BLOCK, // Action:\n```json\n[{"tool_name": "X", ...}]\n``` (Cohere Command-R Plus)
};
function_format function_format = FUNC_JSON_OBJECT;
// For FUNC_JSON_OBJECT format - field names (may vary between templates)
std::string name_field = "name"; // Could be "tool_name", "function"
std::string args_field = "arguments"; // Could be "parameters", "params", "input"
std::string id_field; // Optional: "id", "tool_call_id", ""
// For FUNC_TAG_WITH_NAME format
std::string function_prefix; // e.g., "<function="
std::string function_suffix; // e.g., ">"
std::string function_close; // e.g., "</function>"
// For FUNC_PREFIXED_INDEXED format (e.g., Kimi-K2)
std::string per_call_start; // e.g., "<|tool_call_begin|>"
std::string function_namespace; // e.g., "functions." (prefix before function name)
std::string args_marker; // e.g., "<|tool_call_argument_begin|>"
std::string per_call_end; // e.g., "<|tool_call_end|>"
// For FUNC_BRACKET_TAG format (e.g., Mistral Small 3.2)
std::string id_marker; // e.g., "[CALL_ID]" - marker before tool call ID
// For FUNC_MARKDOWN_CODE_BLOCK format (Cohere Command-R Plus)
std::string code_block_marker; // e.g., "Action:" - text marker before code block
std::string code_block_language; // e.g., "json" - language identifier in code fence
// Argument format (how arguments are structured within a function)
enum argument_format {
ARGS_JSON, // Standard JSON object: {"key": "value", ...}
ARGS_TAGGED, // XML-style: <param=key>value</param>
ARGS_KEY_VALUE_TAGS, // <arg_key>key</arg_key><arg_value>value</arg_value> (GLM-4.6)
};
argument_format argument_format = ARGS_JSON;
// For ARGS_TAGGED format
std::string arg_prefix; // e.g., "<param=", "<parameter="
std::string arg_suffix; // e.g., ">"
std::string arg_close; // e.g., "</param>", "</parameter>"
std::string arg_separator; // e.g., "", "\n"
// Flag: template renders null content as "None" string, requires empty string instead
bool requires_nonnull_content = false;
};
Analysis Flow
Template
|
v
Phase 1: analyze_content_structure()
|-- detect_reasoning_markers() - compare outputs with reasoning_content vs without
|-- detect_content_markers() - render with content and detect wrapping
|-- detect_reasoning_mode() - check if prompt ends with open tag
|
v
content_structure
|
v
Phase 2: analyze_tool_structure()
|-- Check minja.supports_tool_calls
|-- Differential analysis for tool patterns
|-- Classify function format (JSON vs tagged)
|-- Classify argument format (JSON vs tagged)
|
v
tool_call_structure
|
v
generate_parser(content_structure, tool_call_structure)
|-- build_reasoning_block(content_structure)
|-- build_content_block(content_structure)
|-- build_tool_section(tool_call_structure, tools)
|-- Compose into final parser
|
v
common_chat_params (parser, grammar, triggers, preserved_tokens)
Entry Point
The mechanism starts in common/chat.cpp, in common_chat_templates_apply_jinja:
// 1. Analyze the template (two-phase)
template_analysis_result analysis = template_analyzer::analyze_template(tmpl);
// 2. Generate the parser and grammar
auto auto_params = universal_peg_generator::generate_parser(analysis, tmpl, params);
// 3. Use if it provides more than basic content handling
if (auto_params.format != COMMON_CHAT_FORMAT_CONTENT_ONLY ||
auto_params.thinking_forced_open ||
!auto_params.parser.empty()) {
return auto_params;
}
Builder Methods
The unified builder (common_chat_peg_unified_builder) provides high-level methods:
build_reasoning_block(cs, reasoning_format, thinking_forced_open)- Build reasoning parserbuild_content_block(cs, reasoning_format)- Build content parserbuild_tool_section(ts, tools, parallel_tool_calls, force_tool_calls)- Build tool sectionbuild_function(ts, name, schema)- Build single function parserbuild_arguments(ts, schema)- Build arguments parser
Key Templates Supported
- Granite -
<think></think>+<response></response>with tool calls - Nemotron - JSON tools with
<TOOLCALL>wrapper - Qwen/Hermes - XML-style
<function=X><param=key>format - Command-R7B -
<|START_THINKING|>/<|START_RESPONSE|>+<|START_ACTION|>tools - DeepSeek R1 - Forced thinking + complex tools
- Mistral Nemo -
[TOOL_CALLS]wrapper - MiniMax -
<minimax:tool_call>wrapper with XML tools - GLM-4.6 -
<minimax:tool_call>+<tool_call>name\n<arg_key>...<arg_value>...format - Kimi-K2 -
FUNC_PREFIXED_INDEXEDformat with namespace and indices - Mistral Small 3.2 -
FUNC_BRACKET_TAGformat with[TOOL_CALLS]markers - Functionary v3.2 -
FUNC_RECIPIENT_BASEDformat with>>>routing
Files
| File | Purpose |
|---|---|
common/chat-auto-parser.h |
Data structures and API declarations |
common/chat-auto-parser-analyzer.cpp |
Phase 1 and Phase 2 analysis implementation |
common/chat-auto-parser-generator.cpp |
PEG parser generator |
common/chat-auto-parser-helpers.h/cpp |
Shared helper functions |
common/chat-peg-parser.h/cpp |
Unified builder and mapper classes |
common/chat.cpp |
Main entry point and wire-up |
Algorithm Details
Phase 1: Content & Reasoning Analysis
Reasoning Detection (4 Methods)
Method 1: Differential Reasoning Content Analysis
- Render template with
reasoning_contentfield present vs absent - Compare outputs to find markers between
THOUGHT_MARKERandCONTENT_MARKER - If only closing tag found, derive opening tag using patterns:
- XML:
</tag>→<tag> - Special tokens:
<|END_X|>→<|START_X|>,<|/X|>→<|X|>
- XML:
- Handles various tag formats including XML and special token formats
Method 2: Enable-Thinking Toggle Analysis
- Toggle
enable_thinkingcontext variable between true/false - Detects differences in generated prompts
- Handles two scenarios:
- Normal case: enable_thinking=true adds reasoning markers
- Reverse case: enable_thinking=false adds empty thinking block (GLM-4.6 style)
- Uses string difference analysis to extract markers
- Validates extracted tags against blacklist of role markers
Method 3: Prompt Ending Analysis
- Checks if prompt ends with unclosed reasoning tag
- Looks for trailing tags in prompt with
enable_thinking=true - Differentiates between open tags (
<think>) and close tags (</think>) - Handles blacklisted tags (role markers, system tokens)
- Validates reasoning-like patterns (contains "think", "reason", "thought")
Method 4: Adjacent Tag Pair Detection
- Looks for patterns like
<minimax:tool_call></think>,<|START_THINKING|><|END_THINKING|>,[think][/think] - Searches for predefined tag patterns in prompt
- Validates tags are adjacent with only whitespace between
- Supports both simple and complex token formats
Content Detection Algorithm
- Dual-Mode Rendering: Render template with content marker in both thinking-enabled and thinking-disabled modes
- Pattern Matching: Search for known content wrapper patterns:
<|START_RESPONSE|>/<|END_RESPONSE|><response>/</response><output>/</output><answer>/</answer><|CHATBOT_TOKEN|>/<|END_OF_TURN_TOKEN|>
- Mode Classification:
CONTENT_ALWAYS_WRAPPED: Found in both thinking modesCONTENT_WRAPPED_WITH_REASONING: Found only with thinking enabledCONTENT_PLAIN: No wrapping detected
Reasoning Mode Detection
- REASONING_FORCED_OPEN:
- Explicit: Prompt ends with reasoning start marker (e.g.,
<think>). - Implicit: reasoning end marker is present but start marker is empty (e.g.,
[BEGIN FINAL RESPONSE]).
- Explicit: Prompt ends with reasoning start marker (e.g.,
- REASONING_OPTIONAL: Markers present but not forced.
- REASONING_NONE: No markers detected.
Phase 2: Tool Call Structure Analysis
Differential Analysis Algorithm
Test Payload Strategy:
- Base: User + Assistant with content only (no tools)
- Tool 1: User + Assistant with tool_calls (empty args)
- Tool 2: User + Assistant with tool_calls (with args)
- Tool 3: User + Assistant with multiple tool calls
Pattern Extraction Process:
- Compute string differences between base and tool outputs
- Use
test_function_nameas reliable search anchor (usingrfindfor last occurrence) - Extract structural elements:
tool_call_opener: Common prefix before function nametool_call_closer: Common suffix after function callsfunction_opener: Tag immediately before function namefunction_closer: Tag after function contentparameter_key_prefix/suffix: Argument wrapping patterns
Format Classification Logic
FORMAT_JSON_NATIVE:
- Detected by
{"name":pattern intool_call_opener - Or XML markers with JSON structure
FORMAT_XML_CONSTRUCTED:
function_openerstarts with<- No substantial parameter markers
FORMAT_RECIPIENT_BASED:
tool_call_start_marker == function_opener- No parameter markers
- Opener doesn't start with structural chars
FORMAT_BRACKET_TAG:
function_name_suffixcontains bracket tags like[CALL_ID]...[ARGS]tool_call_start_markermatches[TOOL_CALLS]pattern
FORMAT_PREFIXED_INDEXED:
function_openerends with.(namespace separator)function_name_suffixstarts with:followed by digit- Example:
functions.name:0<|tool_call_argument_begin|>
Specialized Format Handling
FUNC_PREFIXED_INDEXED (Kimi-K2):
- Splits
function_openerat last>to getper_call_start+function_namespace - Extracts
args_markerfromfunction_name_suffix - Derives
per_call_endby matching structural patterns intool_call_closer
FUNC_TAG_WITH_NAME (Functionary/Nemotron):
- Detects nested vs non-nested formats
- Uses overlap detection between
tool_section_startandfunction_prefix - Handles double-wrapping prevention
ARGS_KEY_VALUE_TAGS (GLM-4.6):
- Detects
<arg_key>key</arg_key><arg_value>value</arg_value>pattern - Cleans up suffix to extract just the key closer
FUNC_RECIPIENT_BASED (Functionary v3.2):
- Detects
>>>recipient delimiter format - Routes to "all" for content, function name for tools
- Uses same delimiter for both content and tool routing
FUNC_BRACKET_TAG (Mistral Small 3.2/Devstral):
- Detects
[TOOL_CALLS]function_name[ARGS]{...}pattern - Optional
[CALL_ID]idmarker for tool call identification - No section wrapper - each call starts independently
Generator Algorithms
Unified Parser Building
Composition Strategy:
// Standard format
sequence({ reasoning, space(), content, space(), tools, space(), content, end() })
// With section markers
sequence({ reasoning, space(), content_until(section_start), space(), tools, space(), content, end() })
// Forced thinking handling
optional(reasoning) when thinking_forced_open && tools present
Trigger Word Detection:
- Uses
tool_section_startas primary trigger - Falls back to
function_prefixorper_call_start - Raw JSON uses regex pattern trigger
Lazy Grammar Optimization:
- Enabled by default for performance
- Disabled when thinking forced open
- Disabled when no clear trigger word exists
Testing & Debugging
Comprehensive Test Coverage
The test suite covers:
Reasoning Models:
- Qwen-QwQ-32B (forced-open thinking)
- DeepSeek R1 variants (reasoning only)
- IBM Granite (reasoning + tools)
- ByteDance Seed-OSS (custom reasoning tags)
- Ministral-3-14B-Reasoning
- llama-cpp-deepseek-r1
Tool Call Formats:
- JSON: Llama 3.x, Mistral Nemo, Hermes, MiMo-VL
- XML: Nemotron, Qwen3-Coder, MiniMax
- Tagged: GLM-4.6 (key-value tags)
- Bracket-tag: Mistral Small 3.2, Devstral
- Prefixed-indexed: Kimi-K2 variants
- Name-as-key: Apertus-8B
- Recipient-based: Functionary v3.2
Edge Cases:
- Streaming/partial parsing
- Empty content with tools
- Parallel tool calls
- Forced thinking mode
- Multi-byte Unicode markers
- Null content handling
- Multi-line code in tool arguments
- Custom reasoning tags (ByteDance Seed-OSS)
Debug Tools
Template Debugger: tests/debug-template-parser.cpp
- Usage:
./bin/debug-template-parser path/to/template.jinja - Shows detected format, markers, generated parser, and GBNF grammar
Debug Logging: Enable with LLAMA_LOG_VERBOSITY=2
- Shows detailed analysis steps
- Displays pattern extraction results
- Lists generated parser structure
PEG Test Builder: Fluent API for creating test cases
auto tst = peg_tester("template.jinja");
tst.test("input")
.reasoning_format(COMMON_REASONING_FORMAT_AUTO)
.tools({tool})
.expect(expected_message)
.run();
Adding Support for New Templates
To support a new template format:
- If it follows standard patterns - The auto-parser should detect it automatically
- If it has unique markers - Add the markers to the detection patterns in:
detect_reasoning_markers()for reasoning tagsdetect_content_markers()for content wrappersextract_patterns_from_differences()for tool call patterns
- If it needs special handling - Add a dedicated handler in
chat.cppbefore the auto-parser block
Edge Cases and Quirks
- Forced Thinking: If
enable_thinkingis true but the model has already started a thought block (e.g., ended the prompt with<think>), the parser enters "forced thinking" mode where it immediately expects reasoning content. - Ambiguous Content: Templates that mix content and tool calls without clear delimiters can be tricky. The analyzer tries to find "common" start/end patterns across multiple examples to be robust.
- Double Wrapping: Some templates (e.g., Functionary) use the same string for both the tool section start and the function prefix (e.g.,
<function=). The analyzer detects this overlap and prevents double-wrapping in the generated parser. - Null Content Rendering: Some templates render
nullcontent as Python "None" string. The analyzer detects this and patches content to empty string. - Multi-byte Unicode Markers: Some templates use special Unicode characters in markers that require careful handling in GBNF generation.
State of the Autoparser (Jan 2026)
As of January 2026, the unified auto-parser successfully handles major template families including DeepSeek V3/R1, Llama 3.x (native JSON), GLM-4/4.6, and standard XML/JSON formats. It also supports Functionary v3.1/v3.2, Mistral variants, and specialized formats like Kimi-K2's prefixed-indexed structure.
Tested Templates
The following templates have active tests in tests/test-chat.cpp:
| Template | Format | Notes |
|---|---|---|
| DeepSeek V3.1 | FUNC_JSON_OBJECT |
Forced thinking mode |
| DeepSeek R1 Distill (Llama/Qwen) | Reasoning only | Forced-open thinking |
| llama-cpp-deepseek-r1 | Reasoning only | Forced-open thinking |
| GLM-4.6 | ARGS_KEY_VALUE_TAGS |
<tool_call>name\n<arg_key>...<arg_value>... format |
| Kimi-K2 / Kimi-K2-Instruct / Kimi-K2-Thinking | FUNC_PREFIXED_INDEXED |
functions.name:0 with special markers |
| Apertus-8B-Instruct | FUNC_NAME_AS_KEY |
{"function_name": {...}} format |
| MiniMax-M2 | FUNC_TAG_WITH_NAME |
XML invoke with parameter tags |
| NVIDIA-Nemotron-Nano-v2 | FUNC_JSON_OBJECT |
<TOOLCALL> wrapper (nested) |
| Mistral-Nemo-Instruct-2407 | FUNC_JSON_OBJECT |
[TOOL_CALLS] wrapper with id field |
| Functionary v3.1 | FUNC_TAG_WITH_NAME |
<function=X> non-nested format |
| Functionary v3.2 | FUNC_RECIPIENT_BASED |
>>> recipient delimiter format |
| MiMo-VL / Hermes 3 / Qwen 2.5 | FUNC_JSON_OBJECT |
<tool_call> wrapper |
| Apriel 1.5 | FUNC_JSON_OBJECT |
<tool_calls> wrapper with JSON array |
| Apriel 1.6 Thinker | Reasoning only | Implicit reasoning start |
| Cohere Command-R7B | FUNC_JSON_OBJECT |
START_RESPONSE/ACTION/THINKING markers |
| Mistral Small 3.2 | FUNC_BRACKET_TAG |
[TOOL_CALLS]func[ARGS]{...} with ID |
| Devstral | FUNC_BRACKET_TAG |
[TOOL_CALLS]func[ARGS]{...} without ID |
| Ministral-3-14B-Reasoning | Custom reasoning | [THINK]...[/THINK] tags |
| IBM Granite | FUNC_JSON_OBJECT |
<think></think> + <response></response> |
| ByteDance Seed-OSS | FUNC_TAG_WITH_NAME |
Custom <seed:think> and <seed:tool_call> tags |
| Qwen3-Coder | FUNC_TAG_WITH_NAME |
XML-style tool format |
| Cohere Command-R Plus | FUNC_MARKDOWN_CODE_BLOCK |
Action:\n\``json\n[...]\n```` format |
Currently Unsupported Templates
| Template Family | Model / Variant | Issue Description |
|---|---|---|
| OpenAI | GPT-OSS |
Complex channel markers need new format |
Templates Without Tool Support
Some templates genuinely don't support tool calls (this is not a detection bug):
- Phi 3.5 Mini - The official template has no tool handling. Use Phi-4-mini-instruct for function calling, or community fine-tuned versions.
- Google Gemma 2 2B - Pure instruction-following model without tool capabilities.
TODO / Roadmap
- Fix OpenAI GPT-OSS: Add
FUNC_CHANNEL_BASEDformat for channel marker structure. Fix Cohere Command-R Plus: AddedFUNC_MARKDOWN_CODE_BLOCKformat forAction:\n\``json` structure.
Recent Additions (Dec 2025 - Jan 2026)
- FUNC_RECIPIENT_BASED: Support for Functionary v3.2's
>>>recipient delimiter format - FUNC_BRACKET_TAG: Support for Mistral Small 3.2 and Devstral's
[TOOL_CALLS]...format - Enhanced Content Detection: Better handling of custom reasoning tags and content wrappers
- Improved Streaming Support: Better handling of partial parsing for all supported formats
- Custom Tag Support: Support for non-standard reasoning tags like
<seed:think>(ByteDance) - Multi-line Tool Arguments: Better parsing of complex tool arguments with code blocks
- FUNC_MARKDOWN_CODE_BLOCK: Support for Cohere Command-R Plus markdown code block format
- Implicit Reasoning Support: Support for templates where reasoning starts implicitly without a start marker.
The auto-parser now successfully handles 25+ different template formats across reasoning-only, tool-calling, and hybrid models, with comprehensive test coverage ensuring robust parsing across streaming and non-streaming scenarios.