21 KiB

Raw Blame History

Unified Auto-Parser Architecture

The auto-parser automatically analyzes chat templates to determine how to parse model outputs, including content, reasoning, and tool calls.

Overview

The unified auto-parser uses a two-phase incremental analysis approach:

Phase 1: Content & Reasoning Analysis - Analyzes how the template handles basic content and reasoning, without considering tools
Phase 2: Tool Call Analysis - Analyzes tool calling patterns, layered on top of Phase 1

Data Structures

content_structure (Phase 1 Result)

Describes how the template handles content and reasoning:

struct content_structure {
    enum reasoning_mode_type {
        REASONING_NONE,         // No reasoning markers detected
        REASONING_OPTIONAL,     // <think>...</think> may appear before content
        REASONING_FORCED_OPEN,  // Template ends with open reasoning tag OR starts implicitly (empty start, present end)
    };

    reasoning_mode_type reasoning_mode = REASONING_NONE;
    std::string         reasoning_start;  // e.g., "<think>", "<|START_THINKING|>"
    std::string         reasoning_end;    // e.g., "</think>", "<|END_THINKING|>"

    // Content wrapping mode
    enum content_mode_type {
        CONTENT_PLAIN,                   // No content markers
        CONTENT_ALWAYS_WRAPPED,          // <response>...</response> always present
        CONTENT_WRAPPED_WITH_REASONING,  // Content wrapped only when reasoning present
    };

    content_mode_type content_mode = CONTENT_PLAIN;
    std::string       content_start;  // e.g., "<response>", "<|START_RESPONSE|>"
    std::string       content_end;    // e.g., "</response>", "<|END_RESPONSE|>"
};

tool_call_structure (Phase 2 Result)

Describes how the template formats tool calls:

struct tool_call_structure {
    bool supports_tools = false;

    // Container markers (what wraps all tool calls)
    std::string tool_section_start;  // e.g., "<tool_call>", "[TOOL_CALLS]", "<TOOLCALL>", ""
    std::string tool_section_end;    // e.g., "</tool_call>", "]", "</TOOLCALL>", ""

    // Function format (how individual functions are structured)
    enum function_format {
        FUNC_JSON_OBJECT,       // {"name": "X", "arguments": {...}}
        FUNC_TAG_WITH_NAME,     // <function=X>{...}</function>
        FUNC_TAG_NAME_ONLY,     // <X>...</X> where X is function name (rare)
        FUNC_PREFIXED_INDEXED,  // <|tool_call_begin|>functions.X:0<|tool_call_argument_begin|>{...}<|tool_call_end|>
        FUNC_NAME_AS_KEY,       // [{"function_name": {...arguments...}}] (Apertus-style)
        FUNC_BRACKET_TAG,       // [TOOL_CALLS]X[CALL_ID]id[ARGS]{...} (Mistral Small 3.2 style)
        FUNC_RECIPIENT_BASED,   // >>>recipient\n{content} where recipient is "all" (content) or function name (tools)
        FUNC_MARKDOWN_CODE_BLOCK,  // Action:\n```json\n[{"tool_name": "X", ...}]\n``` (Cohere Command-R Plus)
    };
    function_format function_format = FUNC_JSON_OBJECT;

    // For FUNC_JSON_OBJECT format - field names (may vary between templates)
    std::string name_field = "name";       // Could be "tool_name", "function"
    std::string args_field = "arguments";  // Could be "parameters", "params", "input"
    std::string id_field;                  // Optional: "id", "tool_call_id", ""

    // For FUNC_TAG_WITH_NAME format
    std::string function_prefix;  // e.g., "<function="
    std::string function_suffix;  // e.g., ">"
    std::string function_close;   // e.g., "</function>"

    // For FUNC_PREFIXED_INDEXED format (e.g., Kimi-K2)
    std::string per_call_start;      // e.g., "<|tool_call_begin|>"
    std::string function_namespace;  // e.g., "functions." (prefix before function name)
    std::string args_marker;         // e.g., "<|tool_call_argument_begin|>"
    std::string per_call_end;        // e.g., "<|tool_call_end|>"

    // For FUNC_BRACKET_TAG format (e.g., Mistral Small 3.2)
    std::string id_marker;  // e.g., "[CALL_ID]" - marker before tool call ID

    // For FUNC_MARKDOWN_CODE_BLOCK format (Cohere Command-R Plus)
    std::string code_block_marker;    // e.g., "Action:" - text marker before code block
    std::string code_block_language;  // e.g., "json" - language identifier in code fence

    // Argument format (how arguments are structured within a function)
    enum argument_format {
        ARGS_JSON,            // Standard JSON object: {"key": "value", ...}
        ARGS_TAGGED,          // XML-style: <param=key>value</param>
        ARGS_KEY_VALUE_TAGS,  // <arg_key>key</arg_key><arg_value>value</arg_value> (GLM-4.6)
    };
    argument_format argument_format = ARGS_JSON;

    // For ARGS_TAGGED format
    std::string arg_prefix;     // e.g., "<param=", "<parameter="
    std::string arg_suffix;     // e.g., ">"
    std::string arg_close;      // e.g., "</param>", "</parameter>"
    std::string arg_separator;  // e.g., "", "\n"

    // Flag: template renders null content as "None" string, requires empty string instead
    bool requires_nonnull_content = false;
};

Analysis Flow

Template
    |
    v
Phase 1: analyze_content_structure()
    |-- detect_reasoning_markers() - compare outputs with reasoning_content vs without
    |-- detect_content_markers() - render with content and detect wrapping
    |-- detect_reasoning_mode() - check if prompt ends with open tag
    |
    v
content_structure
    |
    v
Phase 2: analyze_tool_structure()
    |-- Check minja.supports_tool_calls
    |-- Differential analysis for tool patterns
    |-- Classify function format (JSON vs tagged)
    |-- Classify argument format (JSON vs tagged)
    |
    v
tool_call_structure
    |
    v
generate_parser(content_structure, tool_call_structure)
    |-- build_reasoning_block(content_structure)
    |-- build_content_block(content_structure)
    |-- build_tool_section(tool_call_structure, tools)
    |-- Compose into final parser
    |
    v
common_chat_params (parser, grammar, triggers, preserved_tokens)

Entry Point

The mechanism starts in common/chat.cpp, in common_chat_templates_apply_jinja:

// 1. Analyze the template (two-phase)
template_analysis_result analysis = template_analyzer::analyze_template(tmpl);

// 2. Generate the parser and grammar
auto auto_params = universal_peg_generator::generate_parser(analysis, tmpl, params);

// 3. Use if it provides more than basic content handling
if (auto_params.format != COMMON_CHAT_FORMAT_CONTENT_ONLY ||
    auto_params.thinking_forced_open ||
    !auto_params.parser.empty()) {
    return auto_params;
}

Builder Methods

The unified builder (common_chat_peg_unified_builder) provides high-level methods:

build_reasoning_block(cs, reasoning_format, thinking_forced_open) - Build reasoning parser
build_content_block(cs, reasoning_format) - Build content parser
build_tool_section(ts, tools, parallel_tool_calls, force_tool_calls) - Build tool section
build_function(ts, name, schema) - Build single function parser
build_arguments(ts, schema) - Build arguments parser

Key Templates Supported

Granite - <think></think> + <response></response> with tool calls
Nemotron - JSON tools with <TOOLCALL> wrapper
Qwen/Hermes - XML-style <function=X><param=key> format
Command-R7B - <|START_THINKING|>/<|START_RESPONSE|> + <|START_ACTION|> tools
DeepSeek R1 - Forced thinking + complex tools
Mistral Nemo - [TOOL_CALLS] wrapper
MiniMax - <minimax:tool_call> wrapper with XML tools
GLM-4.6 - <minimax:tool_call> + <tool_call>name\n<arg_key>...<arg_value>... format
Kimi-K2 - FUNC_PREFIXED_INDEXED format with namespace and indices
Mistral Small 3.2 - FUNC_BRACKET_TAG format with [TOOL_CALLS] markers
Functionary v3.2 - FUNC_RECIPIENT_BASED format with >>> routing

Files

File	Purpose
`common/chat-auto-parser.h`	Data structures and API declarations
`common/chat-auto-parser-analyzer.cpp`	Phase 1 and Phase 2 analysis implementation
`common/chat-auto-parser-generator.cpp`	PEG parser generator
`common/chat-auto-parser-helpers.h/cpp`	Shared helper functions
`common/chat-peg-parser.h/cpp`	Unified builder and mapper classes
`common/chat.cpp`	Main entry point and wire-up

Algorithm Details

Phase 1: Content & Reasoning Analysis

Reasoning Detection (4 Methods)

Method 1: Differential Reasoning Content Analysis

Render template with reasoning_content field present vs absent
Compare outputs to find markers between THOUGHT_MARKER and CONTENT_MARKER
If only closing tag found, derive opening tag using patterns:
- XML: </tag> → <tag>
- Special tokens: <|END_X|> → <|START_X|>, <|/X|> → <|X|>
Handles various tag formats including XML and special token formats

Method 2: Enable-Thinking Toggle Analysis

Toggle enable_thinking context variable between true/false
Detects differences in generated prompts
Handles two scenarios:
- Normal case: enable_thinking=true adds reasoning markers
- Reverse case: enable_thinking=false adds empty thinking block (GLM-4.6 style)
Uses string difference analysis to extract markers
Validates extracted tags against blacklist of role markers

Method 3: Prompt Ending Analysis

Checks if prompt ends with unclosed reasoning tag
Looks for trailing tags in prompt with enable_thinking=true
Differentiates between open tags (<think>) and close tags (</think>)
Handles blacklisted tags (role markers, system tokens)
Validates reasoning-like patterns (contains "think", "reason", "thought")

Method 4: Adjacent Tag Pair Detection

Looks for patterns like <minimax:tool_call></think>, <|START_THINKING|><|END_THINKING|>, [think][/think]
Searches for predefined tag patterns in prompt
Validates tags are adjacent with only whitespace between
Supports both simple and complex token formats

Content Detection Algorithm

Dual-Mode Rendering: Render template with content marker in both thinking-enabled and thinking-disabled modes
Pattern Matching: Search for known content wrapper patterns:
- <|START_RESPONSE|> / <|END_RESPONSE|>
- <response> / </response>
- <output> / </output>
- <answer> / </answer>
- <|CHATBOT_TOKEN|> / <|END_OF_TURN_TOKEN|>
Mode Classification:
- CONTENT_ALWAYS_WRAPPED: Found in both thinking modes
- CONTENT_WRAPPED_WITH_REASONING: Found only with thinking enabled
- CONTENT_PLAIN: No wrapping detected

Reasoning Mode Detection

REASONING_FORCED_OPEN:
- Explicit: Prompt ends with reasoning start marker (e.g., <think>).
- Implicit: reasoning end marker is present but start marker is empty (e.g., [BEGIN FINAL RESPONSE]).
REASONING_OPTIONAL: Markers present but not forced.
REASONING_NONE: No markers detected.

Phase 2: Tool Call Structure Analysis

Differential Analysis Algorithm

Test Payload Strategy:

Base: User + Assistant with content only (no tools)
Tool 1: User + Assistant with tool_calls (empty args)
Tool 2: User + Assistant with tool_calls (with args)
Tool 3: User + Assistant with multiple tool calls

Pattern Extraction Process:

Compute string differences between base and tool outputs
Use test_function_name as reliable search anchor (using rfind for last occurrence)
Extract structural elements:
- tool_call_opener: Common prefix before function name
- tool_call_closer: Common suffix after function calls
- function_opener: Tag immediately before function name
- function_closer: Tag after function content
- parameter_key_prefix/suffix: Argument wrapping patterns

Format Classification Logic

FORMAT_JSON_NATIVE:

Detected by {"name": pattern in tool_call_opener
Or XML markers with JSON structure

FORMAT_XML_CONSTRUCTED:

function_opener starts with <
No substantial parameter markers

FORMAT_RECIPIENT_BASED:

tool_call_start_marker == function_opener
No parameter markers
Opener doesn't start with structural chars

FORMAT_BRACKET_TAG:

function_name_suffix contains bracket tags like [CALL_ID]...[ARGS]
tool_call_start_marker matches [TOOL_CALLS] pattern

FORMAT_PREFIXED_INDEXED:

function_opener ends with . (namespace separator)
function_name_suffix starts with : followed by digit
Example: functions.name:0<|tool_call_argument_begin|>

Specialized Format Handling

FUNC_PREFIXED_INDEXED (Kimi-K2):

Splits function_opener at last > to get per_call_start + function_namespace
Extracts args_marker from function_name_suffix
Derives per_call_end by matching structural patterns in tool_call_closer

FUNC_TAG_WITH_NAME (Functionary/Nemotron):

Detects nested vs non-nested formats
Uses overlap detection between tool_section_start and function_prefix
Handles double-wrapping prevention

ARGS_KEY_VALUE_TAGS (GLM-4.6):

Detects <arg_key>key</arg_key><arg_value>value</arg_value> pattern
Cleans up suffix to extract just the key closer

FUNC_RECIPIENT_BASED (Functionary v3.2):

Detects >>> recipient delimiter format
Routes to "all" for content, function name for tools
Uses same delimiter for both content and tool routing

FUNC_BRACKET_TAG (Mistral Small 3.2/Devstral):

Detects [TOOL_CALLS]function_name[ARGS]{...} pattern
Optional [CALL_ID]id marker for tool call identification
No section wrapper - each call starts independently

Generator Algorithms

Unified Parser Building

Composition Strategy:

// Standard format
sequence({ reasoning, space(), content, space(), tools, space(), content, end() })

// With section markers
sequence({ reasoning, space(), content_until(section_start), space(), tools, space(), content, end() })

// Forced thinking handling
optional(reasoning) when thinking_forced_open && tools present

Trigger Word Detection:

Uses tool_section_start as primary trigger
Falls back to function_prefix or per_call_start
Raw JSON uses regex pattern trigger

Lazy Grammar Optimization:

Enabled by default for performance
Disabled when thinking forced open
Disabled when no clear trigger word exists

Testing & Debugging

Comprehensive Test Coverage

The test suite covers:

Reasoning Models:

Qwen-QwQ-32B (forced-open thinking)
DeepSeek R1 variants (reasoning only)
IBM Granite (reasoning + tools)
ByteDance Seed-OSS (custom reasoning tags)
Ministral-3-14B-Reasoning
llama-cpp-deepseek-r1

Tool Call Formats:

JSON: Llama 3.x, Mistral Nemo, Hermes, MiMo-VL
XML: Nemotron, Qwen3-Coder, MiniMax
Tagged: GLM-4.6 (key-value tags)
Bracket-tag: Mistral Small 3.2, Devstral
Prefixed-indexed: Kimi-K2 variants
Name-as-key: Apertus-8B
Recipient-based: Functionary v3.2

Edge Cases:

Streaming/partial parsing
Empty content with tools
Parallel tool calls
Forced thinking mode
Multi-byte Unicode markers
Null content handling
Multi-line code in tool arguments
Custom reasoning tags (ByteDance Seed-OSS)

Debug Tools

Template Debugger: tests/debug-template-parser.cpp

Usage: ./bin/debug-template-parser path/to/template.jinja
Shows detected format, markers, generated parser, and GBNF grammar

Debug Logging: Enable with LLAMA_LOG_VERBOSITY=2

Shows detailed analysis steps
Displays pattern extraction results
Lists generated parser structure

PEG Test Builder: Fluent API for creating test cases

auto tst = peg_tester("template.jinja");
tst.test("input")
   .reasoning_format(COMMON_REASONING_FORMAT_AUTO)
   .tools({tool})
   .expect(expected_message)
   .run();

Adding Support for New Templates

To support a new template format:

If it follows standard patterns - The auto-parser should detect it automatically
If it has unique markers - Add the markers to the detection patterns in:
- detect_reasoning_markers() for reasoning tags
- detect_content_markers() for content wrappers
- extract_patterns_from_differences() for tool call patterns
If it needs special handling - Add a dedicated handler in chat.cpp before the auto-parser block

Edge Cases and Quirks

Forced Thinking: If enable_thinking is true but the model has already started a thought block (e.g., ended the prompt with <think>), the parser enters "forced thinking" mode where it immediately expects reasoning content.
Ambiguous Content: Templates that mix content and tool calls without clear delimiters can be tricky. The analyzer tries to find "common" start/end patterns across multiple examples to be robust.
Double Wrapping: Some templates (e.g., Functionary) use the same string for both the tool section start and the function prefix (e.g., <function=). The analyzer detects this overlap and prevents double-wrapping in the generated parser.
Null Content Rendering: Some templates render null content as Python "None" string. The analyzer detects this and patches content to empty string.
Multi-byte Unicode Markers: Some templates use special Unicode characters in markers that require careful handling in GBNF generation.

State of the Autoparser (Jan 2026)

As of January 2026, the unified auto-parser successfully handles major template families including DeepSeek V3/R1, Llama 3.x (native JSON), GLM-4/4.6, and standard XML/JSON formats. It also supports Functionary v3.1/v3.2, Mistral variants, and specialized formats like Kimi-K2's prefixed-indexed structure.

Tested Templates

The following templates have active tests in tests/test-chat.cpp:

Template	Format	Notes
DeepSeek V3.1	`FUNC_JSON_OBJECT`	Forced thinking mode
DeepSeek R1 Distill (Llama/Qwen)	Reasoning only	Forced-open thinking
llama-cpp-deepseek-r1	Reasoning only	Forced-open thinking
GLM-4.6	`ARGS_KEY_VALUE_TAGS`	`<tool_call>name\n<arg_key>...<arg_value>...` format
Kimi-K2 / Kimi-K2-Instruct / Kimi-K2-Thinking	`FUNC_PREFIXED_INDEXED`	`functions.name:0` with special markers
Apertus-8B-Instruct	`FUNC_NAME_AS_KEY`	`{"function_name": {...}}` format
MiniMax-M2	`FUNC_TAG_WITH_NAME`	XML invoke with parameter tags
NVIDIA-Nemotron-Nano-v2	`FUNC_JSON_OBJECT`	`<TOOLCALL>` wrapper (nested)
Mistral-Nemo-Instruct-2407	`FUNC_JSON_OBJECT`	`[TOOL_CALLS]` wrapper with id field
Functionary v3.1	`FUNC_TAG_WITH_NAME`	`<function=X>` non-nested format
Functionary v3.2	`FUNC_RECIPIENT_BASED`	`>>>` recipient delimiter format
MiMo-VL / Hermes 3 / Qwen 2.5	`FUNC_JSON_OBJECT`	`<tool_call>` wrapper
Apriel 1.5	`FUNC_JSON_OBJECT`	`<tool_calls>` wrapper with JSON array
Apriel 1.6 Thinker	Reasoning only	Implicit reasoning start
Cohere Command-R7B	`FUNC_JSON_OBJECT`	`START_RESPONSE/ACTION/THINKING` markers
Mistral Small 3.2	`FUNC_BRACKET_TAG`	`[TOOL_CALLS]func[ARGS]{...}` with ID
Devstral	`FUNC_BRACKET_TAG`	`[TOOL_CALLS]func[ARGS]{...}` without ID
Ministral-3-14B-Reasoning	Custom reasoning	`[THINK]...[/THINK]` tags
IBM Granite	`FUNC_JSON_OBJECT`	`<think></think>` + `<response></response>`
ByteDance Seed-OSS	`FUNC_TAG_WITH_NAME`	Custom `<seed:think>` and `<seed:tool_call>` tags
Qwen3-Coder	`FUNC_TAG_WITH_NAME`	XML-style tool format
Cohere Command-R Plus	`FUNC_MARKDOWN_CODE_BLOCK`	`Action:\n\```json\n[...]\n```` format

Currently Unsupported Templates

Template Family	Model / Variant	Issue Description
OpenAI	`GPT-OSS`	Complex channel markers need new format

Templates Without Tool Support

Some templates genuinely don't support tool calls (this is not a detection bug):

Phi 3.5 Mini - The official template has no tool handling. Use Phi-4-mini-instruct for function calling, or community fine-tuned versions.
Google Gemma 2 2B - Pure instruction-following model without tool capabilities.

TODO / Roadmap

Fix OpenAI GPT-OSS: Add FUNC_CHANNEL_BASED format for channel marker structure.
~~Fix Cohere Command-R Plus~~: Added FUNC_MARKDOWN_CODE_BLOCK format for Action:\n\``json` structure.

Recent Additions (Dec 2025 - Jan 2026)

FUNC_RECIPIENT_BASED: Support for Functionary v3.2's >>> recipient delimiter format
FUNC_BRACKET_TAG: Support for Mistral Small 3.2 and Devstral's [TOOL_CALLS]... format
Enhanced Content Detection: Better handling of custom reasoning tags and content wrappers
Improved Streaming Support: Better handling of partial parsing for all supported formats
Custom Tag Support: Support for non-standard reasoning tags like <seed:think> (ByteDance)
Multi-line Tool Arguments: Better parsing of complex tool arguments with code blocks
FUNC_MARKDOWN_CODE_BLOCK: Support for Cohere Command-R Plus markdown code block format
Implicit Reasoning Support: Support for templates where reasoning starts implicitly without a start marker.

The auto-parser now successfully handles 25+ different template formats across reasoning-only, tool-calling, and hybrid models, with comprehensive test coverage ensuring robust parsing across streaming and non-streaming scenarios.

21 KiB Raw Blame History