# Unified Auto-Parser Architecture The auto-parser automatically analyzes chat templates to determine how to parse model outputs, including content, reasoning, and tool calls. ## Overview The unified auto-parser uses a two-phase incremental analysis approach: 1. **Phase 1: Content & Reasoning Analysis** - Analyzes how the template handles basic content and reasoning, without considering tools 2. **Phase 2: Tool Call Analysis** - Analyzes tool calling patterns, layered on top of Phase 1 ## Data Structures ### content_structure (Phase 1 Result) Describes how the template handles content and reasoning: ```cpp struct content_structure { enum reasoning_mode_type { REASONING_NONE, // No reasoning markers detected REASONING_OPTIONAL, // ... may appear before content REASONING_FORCED_OPEN, // Template ends with open reasoning tag OR starts implicitly (empty start, present end) }; reasoning_mode_type reasoning_mode = REASONING_NONE; std::string reasoning_start; // e.g., "", "<|START_THINKING|>" std::string reasoning_end; // e.g., "", "<|END_THINKING|>" // Content wrapping mode enum content_mode_type { CONTENT_PLAIN, // No content markers CONTENT_ALWAYS_WRAPPED, // ... always present CONTENT_WRAPPED_WITH_REASONING, // Content wrapped only when reasoning present }; content_mode_type content_mode = CONTENT_PLAIN; std::string content_start; // e.g., "", "<|START_RESPONSE|>" std::string content_end; // e.g., "", "<|END_RESPONSE|>" }; ``` ### tool_call_structure (Phase 2 Result) Describes how the template formats tool calls: ```cpp struct tool_call_structure { bool supports_tools = false; // Container markers (what wraps all tool calls) std::string tool_section_start; // e.g., "", "[TOOL_CALLS]", "", "" std::string tool_section_end; // e.g., "", "]", "", "" // Function format (how individual functions are structured) enum function_format { FUNC_JSON_OBJECT, // {"name": "X", "arguments": {...}} FUNC_TAG_WITH_NAME, // {...} FUNC_TAG_NAME_ONLY, // ... where X is function name (rare) FUNC_PREFIXED_INDEXED, // <|tool_call_begin|>functions.X:0<|tool_call_argument_begin|>{...}<|tool_call_end|> FUNC_NAME_AS_KEY, // [{"function_name": {...arguments...}}] (Apertus-style) FUNC_BRACKET_TAG, // [TOOL_CALLS]X[CALL_ID]id[ARGS]{...} (Mistral Small 3.2 style) FUNC_RECIPIENT_BASED, // >>>recipient\n{content} where recipient is "all" (content) or function name (tools) FUNC_MARKDOWN_CODE_BLOCK, // Action:\n```json\n[{"tool_name": "X", ...}]\n``` (Cohere Command-R Plus) }; function_format function_format = FUNC_JSON_OBJECT; // For FUNC_JSON_OBJECT format - field names (may vary between templates) std::string name_field = "name"; // Could be "tool_name", "function" std::string args_field = "arguments"; // Could be "parameters", "params", "input" std::string id_field; // Optional: "id", "tool_call_id", "" // For FUNC_TAG_WITH_NAME format std::string function_prefix; // e.g., "" std::string function_close; // e.g., "" // For FUNC_PREFIXED_INDEXED format (e.g., Kimi-K2) std::string per_call_start; // e.g., "<|tool_call_begin|>" std::string function_namespace; // e.g., "functions." (prefix before function name) std::string args_marker; // e.g., "<|tool_call_argument_begin|>" std::string per_call_end; // e.g., "<|tool_call_end|>" // For FUNC_BRACKET_TAG format (e.g., Mistral Small 3.2) std::string id_marker; // e.g., "[CALL_ID]" - marker before tool call ID // For FUNC_MARKDOWN_CODE_BLOCK format (Cohere Command-R Plus) std::string code_block_marker; // e.g., "Action:" - text marker before code block std::string code_block_language; // e.g., "json" - language identifier in code fence // Argument format (how arguments are structured within a function) enum argument_format { ARGS_JSON, // Standard JSON object: {"key": "value", ...} ARGS_TAGGED, // XML-style: value ARGS_KEY_VALUE_TAGS, // keyvalue (GLM-4.6) }; argument_format argument_format = ARGS_JSON; // For ARGS_TAGGED format std::string arg_prefix; // e.g., "" std::string arg_close; // e.g., "", "" std::string arg_separator; // e.g., "", "\n" // Flag: template renders null content as "None" string, requires empty string instead bool requires_nonnull_content = false; }; ``` ## Analysis Flow ```console Template | v Phase 1: analyze_content_structure() |-- detect_reasoning_markers() - compare outputs with reasoning_content vs without |-- detect_content_markers() - render with content and detect wrapping |-- detect_reasoning_mode() - check if prompt ends with open tag | v content_structure | v Phase 2: analyze_tool_structure() |-- Check minja.supports_tool_calls |-- Differential analysis for tool patterns |-- Classify function format (JSON vs tagged) |-- Classify argument format (JSON vs tagged) | v tool_call_structure | v generate_parser(content_structure, tool_call_structure) |-- build_reasoning_block(content_structure) |-- build_content_block(content_structure) |-- build_tool_section(tool_call_structure, tools) |-- Compose into final parser | v common_chat_params (parser, grammar, triggers, preserved_tokens) ``` ## Entry Point The mechanism starts in `common/chat.cpp`, in `common_chat_templates_apply_jinja`: ```cpp // 1. Analyze the template (two-phase) template_analysis_result analysis = template_analyzer::analyze_template(tmpl); // 2. Generate the parser and grammar auto auto_params = universal_peg_generator::generate_parser(analysis, tmpl, params); // 3. Use if it provides more than basic content handling if (auto_params.format != COMMON_CHAT_FORMAT_CONTENT_ONLY || auto_params.thinking_forced_open || !auto_params.parser.empty()) { return auto_params; } ``` ## Builder Methods The unified builder (`common_chat_peg_unified_builder`) provides high-level methods: - `build_reasoning_block(cs, reasoning_format, thinking_forced_open)` - Build reasoning parser - `build_content_block(cs, reasoning_format)` - Build content parser - `build_tool_section(ts, tools, parallel_tool_calls, force_tool_calls)` - Build tool section - `build_function(ts, name, schema)` - Build single function parser - `build_arguments(ts, schema)` - Build arguments parser ## Key Templates Supported - **Granite** - `` + `` with tool calls - **Nemotron** - JSON tools with `` wrapper - **Qwen/Hermes** - XML-style `` format - **Command-R7B** - `<|START_THINKING|>`/`<|START_RESPONSE|>` + `<|START_ACTION|>` tools - **DeepSeek R1** - Forced thinking + complex tools - **Mistral Nemo** - `[TOOL_CALLS]` wrapper - **MiniMax** - `` wrapper with XML tools - **GLM-4.6** - `` + `name\n......` format - **Kimi-K2** - `FUNC_PREFIXED_INDEXED` format with namespace and indices - **Mistral Small 3.2** - `FUNC_BRACKET_TAG` format with `[TOOL_CALLS]` markers - **Functionary v3.2** - `FUNC_RECIPIENT_BASED` format with `>>>` routing ## Files | File | Purpose | |------|---------| | `common/chat-auto-parser.h` | Data structures and API declarations | | `common/chat-auto-parser-analyzer.cpp` | Phase 1 and Phase 2 analysis implementation | | `common/chat-auto-parser-generator.cpp` | PEG parser generator | | `common/chat-auto-parser-helpers.h/cpp` | Shared helper functions | | `common/chat-peg-parser.h/cpp` | Unified builder and mapper classes | | `common/chat.cpp` | Main entry point and wire-up | ## Algorithm Details ### Phase 1: Content & Reasoning Analysis #### Reasoning Detection (4 Methods) **Method 1: Differential Reasoning Content Analysis** - Render template with `reasoning_content` field present vs absent - Compare outputs to find markers between `THOUGHT_MARKER` and `CONTENT_MARKER` - If only closing tag found, derive opening tag using patterns: - XML: `` → `` - Special tokens: `<|END_X|>` → `<|START_X|>`, `<|/X|>` → `<|X|>` - Handles various tag formats including XML and special token formats **Method 2: Enable-Thinking Toggle Analysis** - Toggle `enable_thinking` context variable between true/false - Detects differences in generated prompts - Handles two scenarios: - **Normal case**: enable_thinking=true adds reasoning markers - **Reverse case**: enable_thinking=false adds empty thinking block (GLM-4.6 style) - Uses string difference analysis to extract markers - Validates extracted tags against blacklist of role markers **Method 3: Prompt Ending Analysis** - Checks if prompt ends with unclosed reasoning tag - Looks for trailing tags in prompt with `enable_thinking=true` - Differentiates between open tags (``) and close tags (``) - Handles blacklisted tags (role markers, system tokens) - Validates reasoning-like patterns (contains "think", "reason", "thought") **Method 4: Adjacent Tag Pair Detection** - Looks for patterns like ``, `<|START_THINKING|><|END_THINKING|>`, `[think][/think]` - Searches for predefined tag patterns in prompt - Validates tags are adjacent with only whitespace between - Supports both simple and complex token formats #### Content Detection Algorithm 1. **Dual-Mode Rendering**: Render template with content marker in both thinking-enabled and thinking-disabled modes 2. **Pattern Matching**: Search for known content wrapper patterns: - `<|START_RESPONSE|>` / `<|END_RESPONSE|>` - `` / `` - `` / `` - `` / `` - `<|CHATBOT_TOKEN|>` / `<|END_OF_TURN_TOKEN|>` 3. **Mode Classification**: - `CONTENT_ALWAYS_WRAPPED`: Found in both thinking modes - `CONTENT_WRAPPED_WITH_REASONING`: Found only with thinking enabled - `CONTENT_PLAIN`: No wrapping detected #### Reasoning Mode Detection - **REASONING_FORCED_OPEN**: - **Explicit**: Prompt ends with reasoning start marker (e.g., ``). - **Implicit**: reasoning end marker is present but start marker is empty (e.g., `[BEGIN FINAL RESPONSE]`). - **REASONING_OPTIONAL**: Markers present but not forced. - **REASONING_NONE**: No markers detected. ### Phase 2: Tool Call Structure Analysis #### Differential Analysis Algorithm **Test Payload Strategy**: 1. **Base**: User + Assistant with content only (no tools) 2. **Tool 1**: User + Assistant with tool_calls (empty args) 3. **Tool 2**: User + Assistant with tool_calls (with args) 4. **Tool 3**: User + Assistant with multiple tool calls **Pattern Extraction Process**: 1. Compute string differences between base and tool outputs 2. Use `test_function_name` as reliable search anchor (using `rfind` for last occurrence) 3. Extract structural elements: - `tool_call_opener`: Common prefix before function name - `tool_call_closer`: Common suffix after function calls - `function_opener`: Tag immediately before function name - `function_closer`: Tag after function content - `parameter_key_prefix/suffix`: Argument wrapping patterns #### Format Classification Logic **FORMAT_JSON_NATIVE**: - Detected by `{"name":` pattern in `tool_call_opener` - Or XML markers with JSON structure **FORMAT_XML_CONSTRUCTED**: - `function_opener` starts with `<` - No substantial parameter markers **FORMAT_RECIPIENT_BASED**: - `tool_call_start_marker == function_opener` - No parameter markers - Opener doesn't start with structural chars **FORMAT_BRACKET_TAG**: - `function_name_suffix` contains bracket tags like `[CALL_ID]...[ARGS]` - `tool_call_start_marker` matches `[TOOL_CALLS]` pattern **FORMAT_PREFIXED_INDEXED**: - `function_opener` ends with `.` (namespace separator) - `function_name_suffix` starts with `:` followed by digit - Example: `functions.name:0<|tool_call_argument_begin|>` #### Specialized Format Handling **FUNC_PREFIXED_INDEXED (Kimi-K2)**: - Splits `function_opener` at last `>` to get `per_call_start` + `function_namespace` - Extracts `args_marker` from `function_name_suffix` - Derives `per_call_end` by matching structural patterns in `tool_call_closer` **FUNC_TAG_WITH_NAME (Functionary/Nemotron)**: - Detects nested vs non-nested formats - Uses overlap detection between `tool_section_start` and `function_prefix` - Handles double-wrapping prevention **ARGS_KEY_VALUE_TAGS (GLM-4.6)**: - Detects `keyvalue` pattern - Cleans up suffix to extract just the key closer **FUNC_RECIPIENT_BASED (Functionary v3.2)**: - Detects `>>>` recipient delimiter format - Routes to "all" for content, function name for tools - Uses same delimiter for both content and tool routing **FUNC_BRACKET_TAG (Mistral Small 3.2/Devstral)**: - Detects `[TOOL_CALLS]function_name[ARGS]{...}` pattern - Optional `[CALL_ID]id` marker for tool call identification - No section wrapper - each call starts independently ### Generator Algorithms #### Unified Parser Building **Composition Strategy**: ```cpp // Standard format sequence({ reasoning, space(), content, space(), tools, space(), content, end() }) // With section markers sequence({ reasoning, space(), content_until(section_start), space(), tools, space(), content, end() }) // Forced thinking handling optional(reasoning) when thinking_forced_open && tools present ``` **Trigger Word Detection**: - Uses `tool_section_start` as primary trigger - Falls back to `function_prefix` or `per_call_start` - Raw JSON uses regex pattern trigger **Lazy Grammar Optimization**: - Enabled by default for performance - Disabled when thinking forced open - Disabled when no clear trigger word exists ## Testing & Debugging ### Comprehensive Test Coverage The test suite covers: **Reasoning Models**: - Qwen-QwQ-32B (forced-open thinking) - DeepSeek R1 variants (reasoning only) - IBM Granite (reasoning + tools) - ByteDance Seed-OSS (custom reasoning tags) - Ministral-3-14B-Reasoning - llama-cpp-deepseek-r1 **Tool Call Formats**: - JSON: Llama 3.x, Mistral Nemo, Hermes, MiMo-VL - XML: Nemotron, Qwen3-Coder, MiniMax - Tagged: GLM-4.6 (key-value tags) - Bracket-tag: Mistral Small 3.2, Devstral - Prefixed-indexed: Kimi-K2 variants - Name-as-key: Apertus-8B - Recipient-based: Functionary v3.2 **Edge Cases**: - Streaming/partial parsing - Empty content with tools - Parallel tool calls - Forced thinking mode - Multi-byte Unicode markers - Null content handling - Multi-line code in tool arguments - Custom reasoning tags (ByteDance Seed-OSS) ### Debug Tools **Template Debugger**: `tests/debug-template-parser.cpp` - Usage: `./bin/debug-template-parser path/to/template.jinja` - Shows detected format, markers, generated parser, and GBNF grammar **Debug Logging**: Enable with `LLAMA_LOG_VERBOSITY=2` - Shows detailed analysis steps - Displays pattern extraction results - Lists generated parser structure **PEG Test Builder**: Fluent API for creating test cases ```cpp auto tst = peg_tester("template.jinja"); tst.test("input") .reasoning_format(COMMON_REASONING_FORMAT_AUTO) .tools({tool}) .expect(expected_message) .run(); ``` ## Adding Support for New Templates To support a new template format: 1. **If it follows standard patterns** - The auto-parser should detect it automatically 2. **If it has unique markers** - Add the markers to the detection patterns in: - `detect_reasoning_markers()` for reasoning tags - `detect_content_markers()` for content wrappers - `extract_patterns_from_differences()` for tool call patterns 3. **If it needs special handling** - Add a dedicated handler in `chat.cpp` before the auto-parser block ## Edge Cases and Quirks 1. **Forced Thinking**: If `enable_thinking` is true but the model has already started a thought block (e.g., ended the prompt with ``), the parser enters "forced thinking" mode where it immediately expects reasoning content. 2. **Ambiguous Content**: Templates that mix content and tool calls without clear delimiters can be tricky. The analyzer tries to find "common" start/end patterns across multiple examples to be robust. 3. **Double Wrapping**: Some templates (e.g., Functionary) use the same string for both the tool section start and the function prefix (e.g., `name\n......` format | | Kimi-K2 / Kimi-K2-Instruct / Kimi-K2-Thinking | `FUNC_PREFIXED_INDEXED` | `functions.name:0` with special markers | | Apertus-8B-Instruct | `FUNC_NAME_AS_KEY` | `{"function_name": {...}}` format | | MiniMax-M2 | `FUNC_TAG_WITH_NAME` | XML invoke with parameter tags | | NVIDIA-Nemotron-Nano-v2 | `FUNC_JSON_OBJECT` | `` wrapper (nested) | | Mistral-Nemo-Instruct-2407 | `FUNC_JSON_OBJECT` | `[TOOL_CALLS]` wrapper with id field | | Functionary v3.1 | `FUNC_TAG_WITH_NAME` | `` non-nested format | | Functionary v3.2 | `FUNC_RECIPIENT_BASED` | `>>>` recipient delimiter format | | MiMo-VL / Hermes 3 / Qwen 2.5 | `FUNC_JSON_OBJECT` | `` wrapper | | Apriel 1.5 | `FUNC_JSON_OBJECT` | `` wrapper with JSON array | | Apriel 1.6 Thinker | Reasoning only | Implicit reasoning start | | Cohere Command-R7B | `FUNC_JSON_OBJECT` | `START_RESPONSE/ACTION/THINKING` markers | | Mistral Small 3.2 | `FUNC_BRACKET_TAG` | `[TOOL_CALLS]func[ARGS]{...}` with ID | | Devstral | `FUNC_BRACKET_TAG` | `[TOOL_CALLS]func[ARGS]{...}` without ID | | Ministral-3-14B-Reasoning | Custom reasoning | `[THINK]...[/THINK]` tags | | IBM Granite | `FUNC_JSON_OBJECT` | `` + `` | | ByteDance Seed-OSS | `FUNC_TAG_WITH_NAME` | Custom `` and `` tags | | Qwen3-Coder | `FUNC_TAG_WITH_NAME` | XML-style tool format | | Cohere Command-R Plus | `FUNC_MARKDOWN_CODE_BLOCK` | `Action:\n\`\`\`json\n[...]\n\`\`\`` format | ### Currently Unsupported Templates | Template Family | Model / Variant | Issue Description | |-----------------|-----------------|-------------------| | **OpenAI** | `GPT-OSS` | Complex channel markers need new format | ### Templates Without Tool Support Some templates genuinely don't support tool calls (this is not a detection bug): - **Phi 3.5 Mini** - The official template has no tool handling. Use Phi-4-mini-instruct for function calling, or community fine-tuned versions. - **Google Gemma 2 2B** - Pure instruction-following model without tool capabilities. ### TODO / Roadmap - [ ] **Fix OpenAI GPT-OSS**: Add `FUNC_CHANNEL_BASED` format for channel marker structure. - [x] **~~Fix Cohere Command-R Plus~~**: Added `FUNC_MARKDOWN_CODE_BLOCK` format for `Action:\n\`\`\`json` structure. ### Recent Additions (Dec 2025 - Jan 2026) - **FUNC_RECIPIENT_BASED**: Support for Functionary v3.2's `>>>` recipient delimiter format - **FUNC_BRACKET_TAG**: Support for Mistral Small 3.2 and Devstral's `[TOOL_CALLS]...` format - **Enhanced Content Detection**: Better handling of custom reasoning tags and content wrappers - **Improved Streaming Support**: Better handling of partial parsing for all supported formats - **Custom Tag Support**: Support for non-standard reasoning tags like `` (ByteDance) - **Multi-line Tool Arguments**: Better parsing of complex tool arguments with code blocks - **FUNC_MARKDOWN_CODE_BLOCK**: Support for Cohere Command-R Plus markdown code block format - **Implicit Reasoning Support**: Support for templates where reasoning starts implicitly without a start marker. The auto-parser now successfully handles 25+ different template formats across reasoning-only, tool-calling, and hybrid models, with comprehensive test coverage ensuring robust parsing across streaming and non-streaming scenarios.