llama.cpp/docs/autoparser.md

26 KiB

Unified Auto-Parser Architecture

The auto-parser automatically analyzes chat templates to determine how to parse model outputs, including content, reasoning, and tool calls.

Overview

The unified auto-parser uses a pure differential, compositional approach to analyze chat templates:

Core Philosophy:

  • Zero Hardcoded Patterns: All markers extracted through template comparison (the only heuristic is JSON detection)
  • Compositional Architecture: Separate parsers for reasoning, content, and tools that compose cleanly
  • Variant Types: Structural descriptions (strings) instead of forced enum classification

Two-Phase Analysis:

  1. Phase 1: Content & Reasoning Analysis - Analyzes how the template handles basic content and reasoning, without considering tools
  2. Phase 2: Tool Call Analysis - Analyzes tool calling patterns, layered on top of Phase 1

Data Structures

content_structure (Phase 1 Result)

Describes how the template handles content and reasoning:

struct content_structure {
    enum reasoning_mode_type {
        REASONING_NONE,         // No reasoning markers detected
        REASONING_OPTIONAL,     // <think>...</think> may appear before content
        REASONING_FORCED_OPEN,  // Template ends with open reasoning tag OR starts implicitly (empty start, present end)
    };

    reasoning_mode_type reasoning_mode = REASONING_NONE;
    std::string         reasoning_start;  // e.g., "<think>", "<|START_THINKING|>"
    std::string         reasoning_end;    // e.g., "</think>", "<|END_THINKING|>"

    // Content wrapping mode
    enum content_mode_type {
        CONTENT_PLAIN,                   // No content markers
        CONTENT_ALWAYS_WRAPPED,          // <response>...</response> always present
        CONTENT_WRAPPED_WITH_REASONING,  // Content wrapped only when reasoning present
    };

    content_mode_type content_mode = CONTENT_PLAIN;
    std::string       content_start;  // e.g., "<response>", "<|START_RESPONSE|>"
    std::string       content_end;    // e.g., "</response>", "<|END_RESPONSE|>"
};

diff_analysis_result (Analysis Result)

The result of differential analysis contains all extracted markers and format classifications:

struct diff_analysis_result {
    // Classification results
    reasoning_mode  reasoning = reasoning_mode::NONE;
    content_mode    content   = content_mode::PLAIN;
    tool_format     tools     = tool_format::NONE;
    argument_format args      = argument_format::JSON;

    // All extracted markers (see marker_registry below)
    marker_registry markers;

    // JSON field names (for JSON-based formats)
    std::string name_field = "name";
    std::string args_field = "arguments";
    std::string id_field;

    // Flags
    bool supports_tools           = false;
    bool supports_parallel_calls  = false;
    bool requires_nonnull_content = false;

    // Preserved tokens for tokenizer
    std::vector<std::string> preserved_tokens;
};

marker_registry (Extracted Markers)

All markers are extracted via differential analysis without hardcoded patterns:

struct marker_registry {
    // === Reasoning markers ===
    std::string reasoning_start;  // e.g., "<think>", "[THINK]", "<|START_THINKING|>"
    std::string reasoning_end;    // e.g., "</think>", "[/THINK]", "<|END_THINKING|>"

    // === Content markers ===
    std::string content_start;  // e.g., "<response>", ">>>all\n"
    std::string content_end;    // e.g., "</response>"

    // === Tool section markers ===
    std::string tool_section_start;  // e.g., "<tool_call>", "[TOOL_CALLS]"
    std::string tool_section_end;    // e.g., "</tool_call>", "]"
    std::string per_call_start;      // e.g., "\u2985" (for multi-call templates)
    std::string per_call_end;        // e.g., " \u2985"
    std::string call_separator;      // e.g., ",", "\n"

    // === Function markers ===
    std::string func_name_prefix;  // e.g., "<function=", "\"name\": \""
    std::string func_name_suffix;  // e.g., ">", "\""
    std::string func_close;        // e.g., "</function>"
    std::string args_start;        // e.g., "{", " \u300b"
    std::string args_end;          // e.g., "}", ""

    // === Argument markers (for tagged args format) ===
    std::string arg_name_prefix;   // e.g., "<param=", "<arg_key>"
    std::string arg_name_suffix;   // e.g., ">", "</arg_key>"
    std::string arg_value_prefix;  // e.g., "", "<arg_value>"
    std::string arg_value_suffix;  // e.g., "</param>", "</arg_value>"
    std::string arg_separator;

    // === Special markers ===
    std::string code_block_marker;    // e.g., "Action:" (markdown code block format)
    std::string id_marker;            // e.g., "[CALL_ID]" (bracket-tag format)
    std::string function_namespace;   // e.g., "functions." (prefixed-indexed format)
};

Tool Calling Formats

The auto-parser recognizes three primary tool calling formats. Other formats may be deprecated in future versions.

JSON_NATIVE

Structure: The entire tool call (function name, arguments, and values) is in JSON format. There may be enclosing tags around the tool calling section.

Characteristics:

  • Function name is a JSON field: "name": "function_name"
  • Arguments are a JSON object: "arguments": {"key": "value"}
  • May be wrapped in section markers like <tool_call>...</tool_call> or [TOOL_CALLS]...]

Examples:

Standard OpenAI-style:

<tool_call>
{"name": "get_weather", "arguments": {"location": "Paris", "unit": "celsius"}}
</tool_call>

Mistral Nemo with array wrapper:

[TOOL_CALLS]
[{"name": "calculate", "arguments": {"expr": "2+2"}}]

Hermes-style with tool_calls wrapper:

<tool_calls>
{"name": "search", "arguments": {"query": "llama.cpp"}}
</tool_calls>

Detection: args_start == "{", args_end == "}", no function name prefix markers


TAG_WITH_JSON

Structure: The function name is outside the JSON structure, typically within quasi-XML markers. Arguments are still provided as a JSON object.

Characteristics:

  • Function name appears in tag attributes: <function=function_name> or <tool_call name="function_name">
  • Arguments are a JSON object following the tag
  • Has closing tags: </function> or </tool_call>
  • Arguments remain valid JSON

Examples:

Nemotron-style:

<TOOLCALL>get_weather{"location": "Paris"}</TOOLCALL>

Functionary v3.1:

<function=get_weather>{"location": "Paris", "unit": "celsius"}</function>

ByteDance Seed-OSS:

<seed:tool_call>
<tool_name>get_weather</tool_name>
<parameters>{"location": "Paris"}</parameters>
</seed:tool_call>

MiniMax:

<minimax:tool_call>
<tool_name>calculate</tool_name>
<arguments>{"expr": "2+2"}</arguments>
</minimax:tool_call>

Detection: func_name_prefix starts with <, args_start == "{", arguments are JSON


TAG_WITH_TAGGED

Structure: Both the function name AND argument names are in XML-style tags. Argument values may be JSON or unquoted primitives depending on schema type.

Characteristics:

  • Function name in tag: <function=name> or <invoke=name>
  • Each argument has its own tag: <param=key>value</param>
  • String values are unquoted (raw text content of the tag)
  • Non-string values (objects, arrays, numbers, booleans) are still JSON-formatted
  • Supports streaming: partial arguments can be parsed incrementally

Examples:

Qwen/Hermes XML format:

<function=get_weather>
<param=location>Paris</param>
<param=unit>celsius</param>
</function>

Note how string values (Paris, celsius) are unquoted inside the tags.

Mixed types example:

<function=calculate>
<param=expr>2+2</param>
<param=precision>2</param>
<param=options>{"round": true}</param>
</function>

Here:

  • expr and precision are strings (unquoted)
  • options is an object (JSON-formatted inside the tag)

Detection: arg_name_prefix is non-empty, arguments use tagged format rather than JSON object


Other Formats (To Be Deprecated)

The following formats are currently supported but will likely be deprecated:

Format Description Example
BRACKET_TAG Bracket-based markers [TOOL_CALLS]func[ARGS]{...}
PREFIXED_INDEXED Namespace prefix with index functions.name:0{...}
RECIPIENT_BASED Recipient routing >>>recipient\n{content}
MARKDOWN_BLOCK Markdown code blocks Action:\n\``json\n[...]`

Analysis Flow

Template
    |
    v
Phase 1: analyze_content_structure()
    |-- detect_reasoning_markers() - compare outputs with reasoning_content vs without
    |-- detect_content_markers() - render with content and detect wrapping
    |-- detect_reasoning_mode() - check if prompt ends with open tag
    |
    v
content_structure
    |
    v
Phase 2: analyze_tool_structure()
    |-- Check minja.supports_tool_calls
    |-- Differential analysis for tool patterns
    |-- Classify function format (JSON vs tagged)
    |-- Classify argument format (JSON vs tagged)
    |
    v
diff_analysis_result
    |
    v
generate_parser(diff_analysis_result)
    |-- build_reasoning_block(diff_analysis_result)
    |-- build_content_block(diff_analysis_result)
    |-- build_tool_section(diff_analysis_result, tools)
    |-- Compose into final parser
    |
    v
common_chat_params (parser, grammar, triggers, preserved_tokens)

Entry Point

The mechanism starts in common/chat.cpp, in common_chat_templates_apply_jinja:

// 1. Analyze the template (two-phase)
auto analysis = differential_analyzer::analyze(tmpl);

// 2. Generate the parser and grammar
auto auto_params = universal_peg_generator::generate_parser(tmpl, params);

// 3. Use if it provides more than basic content handling
if (auto_params.format != COMMON_CHAT_FORMAT_CONTENT_ONLY ||
    !auto_params.parser.empty()) {
    return auto_params;
}

Builder Methods

The unified builder (common_chat_peg_unified_builder) provides high-level methods:

  • build_reasoning_block(analysis, reasoning_format, thinking_forced_open) - Build reasoning parser
  • build_content_block(analysis, reasoning_format) - Build content parser
  • build_tool_section(analysis, tools, parallel_tool_calls, force_tool_calls) - Build tool section
  • build_function(analysis, name, schema) - Build single function parser
  • build_arguments(analysis, schema) - Build arguments parser

Key Templates Supported

  • Granite - <think></think> + <response></response> with tool calls
  • Nemotron - JSON tools with <TOOLCALL> wrapper
  • Qwen/Hermes - XML-style <function=X><param=key> format (TAG_WITH_TAGGED)
  • Command-R7B - <|START_THINKING|>/<|START_RESPONSE|> + <|START_ACTION|> tools
  • DeepSeek R1 - Forced thinking + complex tools
  • Mistral Nemo - [TOOL_CALLS] wrapper (JSON_NATIVE)
  • MiniMax - <minimax:tool_call> wrapper with JSON args (TAG_WITH_JSON)
  • GLM-4.6 - <minimax:tool_call> + <tool_call>name\n<arg_key>...<arg_value>... format
  • Kimi-K2 - PREFIXED_INDEXED format with namespace and indices
  • Mistral Small 3.2 - BRACKET_TAG format with [TOOL_CALLS] markers
  • Functionary v3.2 - RECIPIENT_BASED format with >>> routing

Files

File Purpose
common/chat-auto-parser.h Data structures and API declarations
common/chat-diff-analyzer.h/cpp Differential analysis implementation
common/chat-auto-parser-generator.cpp PEG parser generator
common/chat-auto-parser-helpers.h/cpp Shared helper functions
common/chat-peg-parser.h/cpp Unified builder and mapper classes
common/chat.cpp Main entry point and wire-up

Algorithm Details

Phase 1: Content & Reasoning Analysis

Reasoning Detection (4 Methods)

Method 1: Differential Reasoning Content Analysis

  • Render template with reasoning_content field present vs absent
  • Compare outputs to find markers between reasoning and content
  • If only closing tag found, derive opening tag using patterns:
    • XML: </tag><tag>
    • Special tokens: <|END_X|><|START_X|>, <|/X|><|X|>
  • Handles various tag formats including XML and special token formats

Method 2: Enable-Thinking Toggle Analysis

  • Toggle enable_thinking context variable between true/false
  • Detects differences in generated prompts
  • Handles two scenarios:
    • Normal case: enable_thinking=true adds reasoning markers
    • Reverse case: enable_thinking=false adds empty thinking block (GLM-4.6 style)
  • Uses string difference analysis to extract markers
  • Validates extracted tags against blacklist of role markers

Method 3: Prompt Ending Analysis

  • Checks if prompt ends with unclosed reasoning tag
  • Looks for trailing tags in prompt with enable_thinking=true
  • Differentiates between open tags (<think>) and close tags (</think>)
  • Handles blacklisted tags (role markers, system tokens)
  • Validates reasoning-like patterns (contains "think", "reason", "thought")

Method 4: Adjacent Tag Pair Detection

  • Looks for patterns like <minimax:tool_call></think>, <|START_THINKING|><|END_THINKING|>, [think][/think]
  • Searches for predefined tag patterns in prompt
  • Validates tags are adjacent with only whitespace between
  • Supports both simple and complex token formats

Content Detection Algorithm

  1. Dual-Mode Rendering: Render template with content marker in both thinking-enabled and thinking-disabled modes
  2. Pattern Matching: Search for known content wrapper patterns:
    • <|START_RESPONSE|> / <|END_RESPONSE|>
    • <response> / </response>
    • <output> / </output>
    • <answer> / </answer>
    • <|CHATBOT_TOKEN|> / <|END_OF_TURN_TOKEN|>
  3. Mode Classification:
    • CONTENT_ALWAYS_WRAPPED: Found in both thinking modes
    • CONTENT_WRAPPED_WITH_REASONING: Found only with thinking enabled
    • CONTENT_PLAIN: No wrapping detected

Reasoning Mode Detection

  • REASONING_FORCED_OPEN:
    • Explicit: Prompt ends with reasoning start marker (e.g., <think>).
    • Implicit: reasoning end marker is present but start marker is empty (e.g., [BEGIN FINAL RESPONSE]).
  • REASONING_OPTIONAL: Markers present but not forced.
  • REASONING_NONE: No markers detected.

Phase 2: Tool Call Structure Analysis

Pure Differential Analysis Algorithm

Key Principle: All patterns are extracted through template comparison. The only heuristic is detecting JSON vs marker-based structures (via JSON parse attempt). No hardcoded pattern lists.

Comparison Matrix:

Comparison Purpose What's Extracted
T1: No tools vs tools Tool section markers tool_section_start, tool_section_end
T2: 1 call vs 2 calls Call separators per_call_start, call_separator
T3: func_alpha vs func_beta Function boundaries func_name_prefix, func_name_suffix
T4: 1 arg vs 2 args Argument separator arg_separator
T5: No args vs args Args container args_start, args_end
A1: key1 vs key2 Arg name boundaries arg_name_prefix, arg_name_suffix
A2: value A vs B Arg value boundaries arg_value_prefix, arg_value_suffix
A3: number vs string Quoting behavior Value type handling

Structural Extraction Helpers:

// Extract last structural marker from string (finds last <, [, {, or ")
std::string extract_structural_suffix(const std::string & str);

// Extract first structural marker from string (finds first >, ], }, or ")
std::string extract_structural_prefix(const std::string & str);

// The only heuristic: detect if content is valid JSON
bool is_json_based(const std::string & content);

Pattern Extraction Process (Example - T1: Tool Section Markers):

  1. Render template with/without tool calls
  2. Compute diff: calculate_diff_split(output_no_tools, output_with_tools)
  3. Use controlled function name (func_alpha) as anchor in diff.right
  4. Extract structural prefix before function name → tool_section_start
  5. Extract structural suffix after tool content → tool_section_end

No Pattern Lists: Unlike the old approach, there are no hardcoded lists like ["<tool_call>", "[TOOL_CALLS]", ...]. All markers are discovered through differential comparison.

Variant Detection Logic

Instead of forcing patterns into enum types, the analyzer detects variant types as strings that describe the structural characteristics:

Variant Types:

  • "json-native": Pure JSON tool calls (Llama, Mistral Nemo)
  • "tagged-json": Function name in markers, args in JSON (Functionary v3.1, Nemotron)
  • "tagged-args": Full XML-style with tagged arguments (Qwen, Hermes, MiniMax)
  • "bracket-tag": Bracket markers (Mistral Small 3.2: [TOOL_CALLS]func[ARGS]{...})
  • "recipient-based": Recipient routing (Functionary v3.2: >>>func_name)
  • "markdown-block": Markdown code blocks (Cohere Command-R Plus)
  • "prefixed-indexed": Namespace prefix with indices (Kimi-K2: functions.name:0)

Detection Strategy (from most to least distinctive):

void detect_tool_variant(diff_analysis_result & result) {
    // 1. Check for unique markers (most distinctive)
    if (!result.markers.id_marker.empty())
         "bracket-tag"

    if (markers contain ">>>")
         "recipient-based"

    if (code_block_marker present)
         "markdown-block"

    if (function_namespace or suffix contains ':')
         "prefixed-indexed"

    // 2. Check argument structure (JSON variants)
    if (arg_name_prefix starts with '<')
         "tagged-args"

    if (func_name_prefix starts with '<')
         "tagged-json"

    // 3. Default
     "json-native"
}

Compositional Parser Building

The analyzer builds separate, composable parsers for each component:

Reasoning Parser:

  • Built from reasoning_start and reasoning_end markers
  • Supports tag-based, delimiter, and forced-open modes

Content Parser:

  • Built from content_start and content_end markers
  • Supports plain, always-wrapped, and conditionally-wrapped modes

Tool Parser (variant-specific):

  • Built based on variant_type detection
  • Each variant has its own builder that uses the extracted markers
  • No enum forcing - structure preserved as discovered

Final Composition:

sequence({
    reasoning_parser,
    space(),
    content_parser,
    space(),
    tool_parser,
    end()
})

Generator Algorithms

Unified Parser Building

Composition Strategy:

// Standard format
sequence({ reasoning, space(), content, space(), tools, space(), content, end() })

// With section markers
sequence({ reasoning, space(), content_until(section_start), space(), tools, space(), content, end() })

// Forced thinking handling
optional(reasoning) when thinking_forced_open && tools present

Trigger Word Detection:

  • Uses tool_section_start as primary trigger
  • Falls back to function_prefix or per_call_start
  • Raw JSON uses regex pattern trigger

Lazy Grammar Optimization:

  • Enabled by default for performance
  • Disabled when thinking forced open
  • Disabled when no clear trigger word exists

Testing & Debugging

Comprehensive Test Coverage

The test suite covers:

Reasoning Models:

  • Qwen-QwQ-32B (forced-open thinking)
  • DeepSeek R1 variants (reasoning only)
  • IBM Granite (reasoning + tools)
  • ByteDance Seed-OSS (custom reasoning tags)
  • Ministral-3-14B-Reasoning
  • llama-cpp-deepseek-r1

Tool Call Formats:

  • JSON_NATIVE: Llama 3.x, Mistral Nemo, Hermes, MiMo-VL
  • TAG_WITH_JSON: Nemotron, Qwen3-Coder, MiniMax
  • TAG_WITH_TAGGED: Qwen, Hermes (XML), ByteDance Seed-OSS
  • BRACKET_TAG: Mistral Small 3.2, Devstral
  • PREFIXED_INDEXED: Kimi-K2 variants
  • RECIPIENT_BASED: Functionary v3.2
  • MARKDOWN_BLOCK: Cohere Command-R Plus

Edge Cases:

  • Streaming/partial parsing
  • Empty content with tools
  • Parallel tool calls
  • Forced thinking mode
  • Multi-byte Unicode markers
  • Null content handling
  • Multi-line code in tool arguments
  • Custom reasoning tags (ByteDance Seed-OSS)

Debug Tools

Template Debugger: tests/debug-template-parser.cpp

  • Usage: ./bin/debug-template-parser path/to/template.jinja
  • Shows detected format, markers, generated parser, and GBNF grammar

Debug Logging: Enable with LLAMA_LOG_VERBOSITY=2

  • Shows detailed analysis steps
  • Displays pattern extraction results
  • Lists generated parser structure

PEG Test Builder: Fluent API for creating test cases

auto tst = peg_tester("template.jinja");
tst.test("input")
   .reasoning_format(COMMON_REASONING_FORMAT_AUTO)
   .tools({tool})
   .expect(expected_message)
   .run();

Adding Support for New Templates

To support a new template format:

  1. If it follows standard patterns - The auto-parser should detect it automatically using the three main formats (JSON_NATIVE, TAG_WITH_JSON, TAG_WITH_TAGGED)
  2. If it has unique markers - Add differential analysis patterns in:
    • compare_reasoning_presence() for reasoning tags
    • compare_content_values() for content wrappers
    • extract_tool_section() for tool call patterns
  3. If it needs special handling - Add a dedicated handler in chat.cpp before the auto-parser block

Edge Cases and Quirks

  1. Forced Thinking: If enable_thinking is true but the model has already started a thought block (e.g., ended the prompt with <think>), the parser enters "forced thinking" mode where it immediately expects reasoning content.
  2. Ambiguous Content: Templates that mix content and tool calls without clear delimiters can be tricky. The analyzer tries to find "common" start/end patterns across multiple examples to be robust.
  3. Double Wrapping: Some templates (e.g., Functionary) use the same string for both the tool section start and the function prefix (e.g., <function=). The analyzer detects this overlap and prevents double-wrapping in the generated parser.
  4. Null Content Rendering: Some templates render null content as Python "None" string. The analyzer detects this and patches content to empty string.
  5. Multi-byte Unicode Markers: Some templates use special Unicode characters in markers that require careful handling in GBNF generation.

State of the Autoparser (Jan 2026)

As of January 2026, the unified auto-parser successfully handles major template families including DeepSeek V3/R1, Llama 3.x (native JSON), GLM-4/4.6, and standard XML/JSON formats. It also supports Functionary v3.1/v3.2, Mistral variants, and specialized formats like Kimi-K2's prefixed-indexed structure.

Tested Templates

The following templates have active tests in tests/test-chat.cpp:

Template Format Notes
DeepSeek V3.1 JSON_NATIVE Forced thinking mode
DeepSeek R1 Distill (Llama/Qwen) Reasoning only Forced-open thinking
llama-cpp-deepseek-r1 Reasoning only Forced-open thinking
GLM-4.6 TAGGED <tool_call>name\n<arg_key>...<arg_value>... format
Kimi-K2 / Kimi-K2-Instruct / Kimi-K2-Thinking PREFIXED_INDEXED functions.name:0 with special markers
Apertus-8B-Instruct NAME_AS_KEY {"function_name": {...}} format
MiniMax-M2 TAG_WITH_JSON XML invoke with parameter tags
NVIDIA-Nemotron-Nano-v2 JSON_NATIVE <TOOLCALL> wrapper (nested)
Mistral-Nemo-Instruct-2407 JSON_NATIVE [TOOL_CALLS] wrapper with id field
Functionary v3.1 TAG_WITH_JSON <function=X> non-nested format
Functionary v3.2 RECIPIENT_BASED >>> recipient delimiter format
MiMo-VL / Hermes 3 / Qwen 2.5 JSON_NATIVE <tool_call> wrapper
Apriel 1.5 JSON_NATIVE <tool_calls> wrapper with JSON array
Apriel 1.6 Thinker Reasoning only Implicit reasoning start
Cohere Command-R7B JSON_NATIVE START_RESPONSE/ACTION/THINKING markers
Mistral Small 3.2 BRACKET_TAG [TOOL_CALLS]func[ARGS]{...} with ID
Devstral BRACKET_TAG [TOOL_CALLS]func[ARGS]{...} without ID
Ministral-3-14B-Reasoning Custom reasoning [THINK]...[/THINK] tags
IBM Granite JSON_NATIVE <think></think> + <response></response>
ByteDance Seed-OSS TAG_WITH_TAGGED Custom <seed:think> and <seed:tool_call> tags
Qwen3-Coder TAG_WITH_TAGGED XML-style tool format
Cohere Command-R Plus MARKDOWN_BLOCK Action:\n```json\n[...]\n\`` format

Currently Unsupported Templates

Template Family Model / Variant Issue Description
OpenAI GPT-OSS Complex channel markers need new format

Templates Without Tool Support

Some templates genuinely don't support tool calls (this is not a detection bug):

  • Phi 3.5 Mini - The official template has no tool handling. Use Phi-4-mini-instruct for function calling, or community fine-tuned versions.
  • Google Gemma 2 2B - Pure instruction-following model without tool capabilities.

TODO / Roadmap

  • Fix OpenAI GPT-OSS: Add handling for channel marker structure.
  • Fix Cohere Command-R Plus: Added MARKDOWN_BLOCK format for Action:\n```json` structure.

Recent Additions (Dec 2025 - Jan 2026)

  • RECIPIENT_BASED: Support for Functionary v3.2's >>> recipient delimiter format
  • BRACKET_TAG: Support for Mistral Small 3.2 and Devstral's [TOOL_CALLS]... format
  • Enhanced Content Detection: Better handling of custom reasoning tags and content wrappers
  • Improved Streaming Support: Better handling of partial parsing for all supported formats
  • Custom Tag Support: Support for non-standard reasoning tags like <seed:think> (ByteDance)
  • Multi-line Tool Arguments: Better parsing of complex tool arguments with code blocks
  • MARKDOWN_BLOCK: Support for Cohere Command-R Plus markdown code block format
  • Implicit Reasoning Support: Support for templates where reasoning starts implicitly without a start marker.
  • Pure Differential Refactoring (Jan 2026): Complete refactoring to eliminate hardcoded patterns:
    • Removed all hardcoded pattern lists (previously had ["<tool_call>", "[TOOL_CALLS]", ...])
    • Added structural extraction helpers (extract_structural_suffix, extract_structural_prefix)
    • Replaced enum-based classification with string-based variant types
    • Only remaining heuristic: JSON detection via parse attempt
    • All markers now discovered through differential template comparison
  • Three Primary Tool Formats: Consolidated tool calling formats to JSON_NATIVE, TAG_WITH_JSON, and TAG_WITH_TAGGED for clarity and maintainability

The auto-parser now successfully handles 25+ different template formats across reasoning-only, tool-calling, and hybrid models, with comprehensive test coverage ensuring robust parsing across streaming and non-streaming scenarios.