# Unified Auto-Parser Architecture The auto-parser automatically analyzes chat templates to determine how to parse model outputs, including content, reasoning, and tool calls. ## Overview The unified auto-parser uses a **pure differential, compositional approach** to analyze chat templates: **Core Philosophy**: - **Zero Hardcoded Patterns**: All markers extracted through template comparison (the **only heuristic** is JSON detection) - **Compositional Architecture**: Separate parsers for reasoning, content, and tools that compose cleanly - **Variant Types**: Structural descriptions (strings) instead of forced enum classification **Two-Phase Analysis**: 1. **Phase 1: Content & Reasoning Analysis** - Analyzes how the template handles basic content and reasoning, without considering tools 2. **Phase 2: Tool Call Analysis** - Analyzes tool calling patterns, layered on top of Phase 1 ## Data Structures ### content_structure (Phase 1 Result) Describes how the template handles content and reasoning: ```cpp struct content_structure { enum reasoning_mode_type { REASONING_NONE, // No reasoning markers detected REASONING_OPTIONAL, // ... may appear before content REASONING_FORCED_OPEN, // Template ends with open reasoning tag OR starts implicitly (empty start, present end) }; reasoning_mode_type reasoning_mode = REASONING_NONE; std::string reasoning_start; // e.g., "", "<|START_THINKING|>" std::string reasoning_end; // e.g., "", "<|END_THINKING|>" // Content wrapping mode enum content_mode_type { CONTENT_PLAIN, // No content markers CONTENT_ALWAYS_WRAPPED, // ... always present CONTENT_WRAPPED_WITH_REASONING, // Content wrapped only when reasoning present }; content_mode_type content_mode = CONTENT_PLAIN; std::string content_start; // e.g., "", "<|START_RESPONSE|>" std::string content_end; // e.g., "", "<|END_RESPONSE|>" }; ``` ### diff_analysis_result (Analysis Result) The result of differential analysis contains all extracted markers and format classifications: ```cpp struct diff_analysis_result { // Classification results reasoning_mode reasoning = reasoning_mode::NONE; content_mode content = content_mode::PLAIN; tool_format tools = tool_format::NONE; argument_format args = argument_format::JSON; // All extracted markers (see marker_registry below) marker_registry markers; // JSON field names (for JSON-based formats) std::string name_field = "name"; std::string args_field = "arguments"; std::string id_field; // Flags bool supports_tools = false; bool supports_parallel_calls = false; bool requires_nonnull_content = false; // Preserved tokens for tokenizer std::vector preserved_tokens; }; ``` ### marker_registry (Extracted Markers) All markers are extracted via differential analysis without hardcoded patterns: ```cpp struct marker_registry { // === Reasoning markers === std::string reasoning_start; // e.g., "", "[THINK]", "<|START_THINKING|>" std::string reasoning_end; // e.g., "", "[/THINK]", "<|END_THINKING|>" // === Content markers === std::string content_start; // e.g., "", ">>>all\n" std::string content_end; // e.g., "" // === Tool section markers === std::string tool_section_start; // e.g., "", "[TOOL_CALLS]" std::string tool_section_end; // e.g., "", "]" std::string per_call_start; // e.g., "\u2985" (for multi-call templates) std::string per_call_end; // e.g., " \u2985" std::string call_separator; // e.g., ",", "\n" // === Function markers === std::string func_name_prefix; // e.g., "", "\"" std::string func_close; // e.g., "" std::string args_start; // e.g., "{", " \u300b" std::string args_end; // e.g., "}", "" // === Argument markers (for tagged args format) === std::string arg_name_prefix; // e.g., "" std::string arg_name_suffix; // e.g., ">", "" std::string arg_value_prefix; // e.g., "", "" std::string arg_value_suffix; // e.g., "", "" std::string arg_separator; // === Special markers === std::string code_block_marker; // e.g., "Action:" (markdown code block format) std::string id_marker; // e.g., "[CALL_ID]" (bracket-tag format) std::string function_namespace; // e.g., "functions." (prefixed-indexed format) }; ``` ## Tool Calling Formats The auto-parser recognizes three primary tool calling formats. Other formats may be deprecated in future versions. ### JSON_NATIVE **Structure**: The entire tool call (function name, arguments, and values) is in JSON format. There may be enclosing tags around the tool calling section. **Characteristics**: - Function name is a JSON field: `"name": "function_name"` - Arguments are a JSON object: `"arguments": {"key": "value"}` - May be wrapped in section markers like `...` or `[TOOL_CALLS]...]` **Examples**: Standard OpenAI-style: ```json {"name": "get_weather", "arguments": {"location": "Paris", "unit": "celsius"}} ``` Mistral Nemo with array wrapper: ```json [TOOL_CALLS] [{"name": "calculate", "arguments": {"expr": "2+2"}}] ``` Hermes-style with tool_calls wrapper: ```json {"name": "search", "arguments": {"query": "llama.cpp"}} ``` **Detection**: `args_start == "{"`, `args_end == "}"`, no function name prefix markers --- ### TAG_WITH_JSON **Structure**: The function name is outside the JSON structure, typically within quasi-XML markers. Arguments are still provided as a JSON object. **Characteristics**: - Function name appears in tag attributes: `` or `` - Arguments are a JSON object following the tag - Has closing tags: `` or `` - Arguments remain valid JSON **Examples**: Nemotron-style: ```xml get_weather{"location": "Paris"} ``` Functionary v3.1: ```xml {"location": "Paris", "unit": "celsius"} ``` ByteDance Seed-OSS: ```xml get_weather {"location": "Paris"} ``` MiniMax: ```xml calculate {"expr": "2+2"} ``` **Detection**: `func_name_prefix` starts with `<`, `args_start == "{"`, arguments are JSON --- ### TAG_WITH_TAGGED **Structure**: Both the function name AND argument names are in XML-style tags. Argument values may be JSON or unquoted primitives depending on schema type. **Characteristics**: - Function name in tag: `` or `` - Each argument has its own tag: `value` - String values are **unquoted** (raw text content of the tag) - Non-string values (objects, arrays, numbers, booleans) are still JSON-formatted - Supports streaming: partial arguments can be parsed incrementally **Examples**: Qwen/Hermes XML format: ```xml Paris celsius ``` Note how string values (`Paris`, `celsius`) are unquoted inside the tags. Mixed types example: ```xml 2+2 2 {"round": true} ``` Here: - `expr` and `precision` are strings (unquoted) - `options` is an object (JSON-formatted inside the tag) **Detection**: `arg_name_prefix` is non-empty, arguments use tagged format rather than JSON object --- ### Other Formats (To Be Deprecated) The following formats are currently supported but will likely be deprecated: | Format | Description | Example | |--------|-------------|---------| | `BRACKET_TAG` | Bracket-based markers | `[TOOL_CALLS]func[ARGS]{...}` | | `PREFIXED_INDEXED` | Namespace prefix with index | `functions.name:0{...}` | | `RECIPIENT_BASED` | Recipient routing | `>>>recipient\n{content}` | | `MARKDOWN_BLOCK` | Markdown code blocks | `Action:\n\`\`\`json\n[...]` | ## Analysis Flow ```console Template | v Phase 1: analyze_content_structure() |-- detect_reasoning_markers() - compare outputs with reasoning_content vs without |-- detect_content_markers() - render with content and detect wrapping |-- detect_reasoning_mode() - check if prompt ends with open tag | v content_structure | v Phase 2: analyze_tool_structure() |-- Check minja.supports_tool_calls |-- Differential analysis for tool patterns |-- Classify function format (JSON vs tagged) |-- Classify argument format (JSON vs tagged) | v diff_analysis_result | v generate_parser(diff_analysis_result) |-- build_reasoning_block(diff_analysis_result) |-- build_content_block(diff_analysis_result) |-- build_tool_section(diff_analysis_result, tools) |-- Compose into final parser | v common_chat_params (parser, grammar, triggers, preserved_tokens) ``` ## Entry Point The mechanism starts in `common/chat.cpp`, in `common_chat_templates_apply_jinja`: ```cpp // 1. Analyze the template (two-phase) auto analysis = differential_analyzer::analyze(tmpl); // 2. Generate the parser and grammar auto auto_params = universal_peg_generator::generate_parser(tmpl, params); // 3. Use if it provides more than basic content handling if (auto_params.format != COMMON_CHAT_FORMAT_CONTENT_ONLY || !auto_params.parser.empty()) { return auto_params; } ``` ## Builder Methods The unified builder (`common_chat_peg_unified_builder`) provides high-level methods: - `build_reasoning_block(analysis, reasoning_format, thinking_forced_open)` - Build reasoning parser - `build_content_block(analysis, reasoning_format)` - Build content parser - `build_tool_section(analysis, tools, parallel_tool_calls, force_tool_calls)` - Build tool section - `build_function(analysis, name, schema)` - Build single function parser - `build_arguments(analysis, schema)` - Build arguments parser ## Key Templates Supported - **Granite** - `` + `` with tool calls - **Nemotron** - JSON tools with `` wrapper - **Qwen/Hermes** - XML-style `` format (TAG_WITH_TAGGED) - **Command-R7B** - `<|START_THINKING|>`/`<|START_RESPONSE|>` + `<|START_ACTION|>` tools - **DeepSeek R1** - Forced thinking + complex tools - **Mistral Nemo** - `[TOOL_CALLS]` wrapper (JSON_NATIVE) - **MiniMax** - `` wrapper with JSON args (TAG_WITH_JSON) - **GLM-4.6** - `` + `name\n......` format - **Kimi-K2** - `PREFIXED_INDEXED` format with namespace and indices - **Mistral Small 3.2** - `BRACKET_TAG` format with `[TOOL_CALLS]` markers - **Functionary v3.2** - `RECIPIENT_BASED` format with `>>>` routing ## Files | File | Purpose | |------|---------| | `common/chat-auto-parser.h` | Data structures and API declarations | | `common/chat-diff-analyzer.h/cpp` | Differential analysis implementation | | `common/chat-auto-parser-generator.cpp` | PEG parser generator | | `common/chat-auto-parser-helpers.h/cpp` | Shared helper functions | | `common/chat-peg-parser.h/cpp` | Unified builder and mapper classes | | `common/chat.cpp` | Main entry point and wire-up | ## Algorithm Details ### Phase 1: Content & Reasoning Analysis #### Reasoning Detection (4 Methods) **Method 1: Differential Reasoning Content Analysis** - Render template with `reasoning_content` field present vs absent - Compare outputs to find markers between reasoning and content - If only closing tag found, derive opening tag using patterns: - XML: `` → `` - Special tokens: `<|END_X|>` → `<|START_X|>`, `<|/X|>` → `<|X|>` - Handles various tag formats including XML and special token formats **Method 2: Enable-Thinking Toggle Analysis** - Toggle `enable_thinking` context variable between true/false - Detects differences in generated prompts - Handles two scenarios: - **Normal case**: enable_thinking=true adds reasoning markers - **Reverse case**: enable_thinking=false adds empty thinking block (GLM-4.6 style) - Uses string difference analysis to extract markers - Validates extracted tags against blacklist of role markers **Method 3: Prompt Ending Analysis** - Checks if prompt ends with unclosed reasoning tag - Looks for trailing tags in prompt with `enable_thinking=true` - Differentiates between open tags (``) and close tags (``) - Handles blacklisted tags (role markers, system tokens) - Validates reasoning-like patterns (contains "think", "reason", "thought") **Method 4: Adjacent Tag Pair Detection** - Looks for patterns like ``, `<|START_THINKING|><|END_THINKING|>`, `[think][/think]` - Searches for predefined tag patterns in prompt - Validates tags are adjacent with only whitespace between - Supports both simple and complex token formats #### Content Detection Algorithm 1. **Dual-Mode Rendering**: Render template with content marker in both thinking-enabled and thinking-disabled modes 2. **Pattern Matching**: Search for known content wrapper patterns: - `<|START_RESPONSE|>` / `<|END_RESPONSE|>` - `` / `` - `` / `` - `` / `` - `<|CHATBOT_TOKEN|>` / `<|END_OF_TURN_TOKEN|>` 3. **Mode Classification**: - `CONTENT_ALWAYS_WRAPPED`: Found in both thinking modes - `CONTENT_WRAPPED_WITH_REASONING`: Found only with thinking enabled - `CONTENT_PLAIN`: No wrapping detected #### Reasoning Mode Detection - **REASONING_FORCED_OPEN**: - **Explicit**: Prompt ends with reasoning start marker (e.g., ``). - **Implicit**: reasoning end marker is present but start marker is empty (e.g., `[BEGIN FINAL RESPONSE]`). - **REASONING_OPTIONAL**: Markers present but not forced. - **REASONING_NONE**: No markers detected. ### Phase 2: Tool Call Structure Analysis #### Pure Differential Analysis Algorithm **Key Principle**: All patterns are extracted through template comparison. The **only heuristic** is detecting JSON vs marker-based structures (via JSON parse attempt). No hardcoded pattern lists. **Comparison Matrix**: | Comparison | Purpose | What's Extracted | |------------|---------|------------------| | **T1**: No tools vs tools | Tool section markers | `tool_section_start`, `tool_section_end` | | **T2**: 1 call vs 2 calls | Call separators | `per_call_start`, `call_separator` | | **T3**: func_alpha vs func_beta | Function boundaries | `func_name_prefix`, `func_name_suffix` | | **T4**: 1 arg vs 2 args | Argument separator | `arg_separator` | | **T5**: No args vs args | Args container | `args_start`, `args_end` | | **A1**: key1 vs key2 | Arg name boundaries | `arg_name_prefix`, `arg_name_suffix` | | **A2**: value A vs B | Arg value boundaries | `arg_value_prefix`, `arg_value_suffix` | | **A3**: number vs string | Quoting behavior | Value type handling | **Structural Extraction Helpers**: ```cpp // Extract last structural marker from string (finds last <, [, {, or ") std::string extract_structural_suffix(const std::string & str); // Extract first structural marker from string (finds first >, ], }, or ") std::string extract_structural_prefix(const std::string & str); // The only heuristic: detect if content is valid JSON bool is_json_based(const std::string & content); ``` **Pattern Extraction Process** (Example - T1: Tool Section Markers): 1. Render template with/without tool calls 2. Compute diff: `calculate_diff_split(output_no_tools, output_with_tools)` 3. Use controlled function name (`func_alpha`) as anchor in `diff.right` 4. Extract structural prefix before function name → `tool_section_start` 5. Extract structural suffix after tool content → `tool_section_end` **No Pattern Lists**: Unlike the old approach, there are no hardcoded lists like `["", "[TOOL_CALLS]", ...]`. All markers are discovered through differential comparison. #### Variant Detection Logic Instead of forcing patterns into enum types, the analyzer detects **variant types** as strings that describe the structural characteristics: **Variant Types**: - `"json-native"`: Pure JSON tool calls (Llama, Mistral Nemo) - `"tagged-json"`: Function name in markers, args in JSON (Functionary v3.1, Nemotron) - `"tagged-args"`: Full XML-style with tagged arguments (Qwen, Hermes, MiniMax) - `"bracket-tag"`: Bracket markers (Mistral Small 3.2: `[TOOL_CALLS]func[ARGS]{...}`) - `"recipient-based"`: Recipient routing (Functionary v3.2: `>>>func_name`) - `"markdown-block"`: Markdown code blocks (Cohere Command-R Plus) - `"prefixed-indexed"`: Namespace prefix with indices (Kimi-K2: `functions.name:0`) **Detection Strategy** (from most to least distinctive): ```cpp void detect_tool_variant(diff_analysis_result & result) { // 1. Check for unique markers (most distinctive) if (!result.markers.id_marker.empty()) → "bracket-tag" if (markers contain ">>>") → "recipient-based" if (code_block_marker present) → "markdown-block" if (function_namespace or suffix contains ':') → "prefixed-indexed" // 2. Check argument structure (JSON variants) if (arg_name_prefix starts with '<') → "tagged-args" if (func_name_prefix starts with '<') → "tagged-json" // 3. Default → "json-native" } ``` #### Compositional Parser Building The analyzer builds separate, composable parsers for each component: **Reasoning Parser**: - Built from `reasoning_start` and `reasoning_end` markers - Supports tag-based, delimiter, and forced-open modes **Content Parser**: - Built from `content_start` and `content_end` markers - Supports plain, always-wrapped, and conditionally-wrapped modes **Tool Parser** (variant-specific): - Built based on `variant_type` detection - Each variant has its own builder that uses the extracted markers - No enum forcing - structure preserved as discovered **Final Composition**: ```cpp sequence({ reasoning_parser, space(), content_parser, space(), tool_parser, end() }) ``` ### Generator Algorithms #### Unified Parser Building **Composition Strategy**: ```cpp // Standard format sequence({ reasoning, space(), content, space(), tools, space(), content, end() }) // With section markers sequence({ reasoning, space(), content_until(section_start), space(), tools, space(), content, end() }) // Forced thinking handling optional(reasoning) when thinking_forced_open && tools present ``` **Trigger Word Detection**: - Uses `tool_section_start` as primary trigger - Falls back to `function_prefix` or `per_call_start` - Raw JSON uses regex pattern trigger **Lazy Grammar Optimization**: - Enabled by default for performance - Disabled when thinking forced open - Disabled when no clear trigger word exists ## Testing & Debugging ### Comprehensive Test Coverage The test suite covers: **Reasoning Models**: - Qwen-QwQ-32B (forced-open thinking) - DeepSeek R1 variants (reasoning only) - IBM Granite (reasoning + tools) - ByteDance Seed-OSS (custom reasoning tags) - Ministral-3-14B-Reasoning - llama-cpp-deepseek-r1 **Tool Call Formats**: - JSON_NATIVE: Llama 3.x, Mistral Nemo, Hermes, MiMo-VL - TAG_WITH_JSON: Nemotron, Qwen3-Coder, MiniMax - TAG_WITH_TAGGED: Qwen, Hermes (XML), ByteDance Seed-OSS - BRACKET_TAG: Mistral Small 3.2, Devstral - PREFIXED_INDEXED: Kimi-K2 variants - RECIPIENT_BASED: Functionary v3.2 - MARKDOWN_BLOCK: Cohere Command-R Plus **Edge Cases**: - Streaming/partial parsing - Empty content with tools - Parallel tool calls - Forced thinking mode - Multi-byte Unicode markers - Null content handling - Multi-line code in tool arguments - Custom reasoning tags (ByteDance Seed-OSS) ### Debug Tools **Template Debugger**: `tests/debug-template-parser.cpp` - Usage: `./bin/debug-template-parser path/to/template.jinja` - Shows detected format, markers, generated parser, and GBNF grammar **Debug Logging**: Enable with `LLAMA_LOG_VERBOSITY=2` - Shows detailed analysis steps - Displays pattern extraction results - Lists generated parser structure **PEG Test Builder**: Fluent API for creating test cases ```cpp auto tst = peg_tester("template.jinja"); tst.test("input") .reasoning_format(COMMON_REASONING_FORMAT_AUTO) .tools({tool}) .expect(expected_message) .run(); ``` ## Adding Support for New Templates To support a new template format: 1. **If it follows standard patterns** - The auto-parser should detect it automatically using the three main formats (JSON_NATIVE, TAG_WITH_JSON, TAG_WITH_TAGGED) 2. **If it has unique markers** - Add differential analysis patterns in: - `compare_reasoning_presence()` for reasoning tags - `compare_content_values()` for content wrappers - `extract_tool_section()` for tool call patterns 3. **If it needs special handling** - Add a dedicated handler in `chat.cpp` before the auto-parser block ## Edge Cases and Quirks 1. **Forced Thinking**: If `enable_thinking` is true but the model has already started a thought block (e.g., ended the prompt with ``), the parser enters "forced thinking" mode where it immediately expects reasoning content. 2. **Ambiguous Content**: Templates that mix content and tool calls without clear delimiters can be tricky. The analyzer tries to find "common" start/end patterns across multiple examples to be robust. 3. **Double Wrapping**: Some templates (e.g., Functionary) use the same string for both the tool section start and the function prefix (e.g., `name\n......` format | | Kimi-K2 / Kimi-K2-Instruct / Kimi-K2-Thinking | `PREFIXED_INDEXED` | `functions.name:0` with special markers | | Apertus-8B-Instruct | `NAME_AS_KEY` | `{"function_name": {...}}` format | | MiniMax-M2 | `TAG_WITH_JSON` | XML invoke with parameter tags | | NVIDIA-Nemotron-Nano-v2 | `JSON_NATIVE` | `` wrapper (nested) | | Mistral-Nemo-Instruct-2407 | `JSON_NATIVE` | `[TOOL_CALLS]` wrapper with id field | | Functionary v3.1 | `TAG_WITH_JSON` | `` non-nested format | | Functionary v3.2 | `RECIPIENT_BASED` | `>>>` recipient delimiter format | | MiMo-VL / Hermes 3 / Qwen 2.5 | `JSON_NATIVE` | `` wrapper | | Apriel 1.5 | `JSON_NATIVE` | `` wrapper with JSON array | | Apriel 1.6 Thinker | Reasoning only | Implicit reasoning start | | Cohere Command-R7B | `JSON_NATIVE` | START_RESPONSE/ACTION/THINKING markers | | Mistral Small 3.2 | `BRACKET_TAG` | `[TOOL_CALLS]func[ARGS]{...}` with ID | | Devstral | `BRACKET_TAG` | `[TOOL_CALLS]func[ARGS]{...}` without ID | | Ministral-3-14B-Reasoning | Custom reasoning | `[THINK]...[/THINK]` tags | | IBM Granite | `JSON_NATIVE` | `` + `` | | ByteDance Seed-OSS | `TAG_WITH_TAGGED` | Custom `` and `` tags | | Qwen3-Coder | `TAG_WITH_TAGGED` | XML-style tool format | | Cohere Command-R Plus | `MARKDOWN_BLOCK` | `Action:\n`\`\`\`json\n[...]\n`\`\`` format | ### Currently Unsupported Templates | Template Family | Model / Variant | Issue Description | |-----------------|-----------------|-------------------| | **OpenAI** | `GPT-OSS` | Complex channel markers need new format | ### Templates Without Tool Support Some templates genuinely don't support tool calls (this is not a detection bug): - **Phi 3.5 Mini** - The official template has no tool handling. Use Phi-4-mini-instruct for function calling, or community fine-tuned versions. - **Google Gemma 2 2B** - Pure instruction-following model without tool capabilities. ### TODO / Roadmap - [ ] **Fix OpenAI GPT-OSS**: Add handling for channel marker structure. - [x] **~~Fix Cohere Command-R Plus~~**: Added `MARKDOWN_BLOCK` format for `Action:\n`\`\`\`json` structure. ### Recent Additions (Dec 2025 - Jan 2026) - **RECIPIENT_BASED**: Support for Functionary v3.2's `>>>` recipient delimiter format - **BRACKET_TAG**: Support for Mistral Small 3.2 and Devstral's `[TOOL_CALLS]...` format - **Enhanced Content Detection**: Better handling of custom reasoning tags and content wrappers - **Improved Streaming Support**: Better handling of partial parsing for all supported formats - **Custom Tag Support**: Support for non-standard reasoning tags like `` (ByteDance) - **Multi-line Tool Arguments**: Better parsing of complex tool arguments with code blocks - **MARKDOWN_BLOCK**: Support for Cohere Command-R Plus markdown code block format - **Implicit Reasoning Support**: Support for templates where reasoning starts implicitly without a start marker. - **Pure Differential Refactoring (Jan 2026)**: Complete refactoring to eliminate hardcoded patterns: - Removed all hardcoded pattern lists (previously had `["", "[TOOL_CALLS]", ...]`) - Added structural extraction helpers (`extract_structural_suffix`, `extract_structural_prefix`) - Replaced enum-based classification with string-based variant types - Only remaining heuristic: JSON detection via parse attempt - All markers now discovered through differential template comparison - **Three Primary Tool Formats**: Consolidated tool calling formats to JSON_NATIVE, TAG_WITH_JSON, and TAG_WITH_TAGGED for clarity and maintainability The auto-parser now successfully handles 25+ different template formats across reasoning-only, tool-calling, and hybrid models, with comprehensive test coverage ensuring robust parsing across streaming and non-streaming scenarios.