llama.cpp/docs/autoparser.md

# Unified Auto-Parser Architecture

The auto-parser automatically analyzes chat templates to determine how to parse model outputs, including content, reasoning, and tool calls.

## Overview

The unified auto-parser uses a **pure differential, compositional approach** to analyze chat templates:

**Core Philosophy**:

- **Zero Hardcoded Patterns**: All markers extracted through template comparison (the **only heuristic** is JSON detection)
- **Compositional Architecture**: Separate parsers for reasoning, content, and tools that compose cleanly
- **Variant Types**: Structural descriptions (strings) instead of forced enum classification

**Two-Phase Analysis**:

1. **Phase 1: Content & Reasoning Analysis** - Analyzes how the template handles basic content and reasoning, without considering tools
2. **Phase 2: Tool Call Analysis** - Analyzes tool calling patterns, layered on top of Phase 1

## Data Structures

### content_structure (Phase 1 Result)

Describes how the template handles content and reasoning:

```cpp
struct content_structure {
    enum reasoning_mode_type {
        REASONING_NONE,         // No reasoning markers detected
        REASONING_OPTIONAL,     // <think>...</think> may appear before content
        REASONING_FORCED_OPEN,  // Template ends with open reasoning tag OR starts implicitly (empty start, present end)
    };

    reasoning_mode_type reasoning_mode = REASONING_NONE;
    std::string         reasoning_start;  // e.g., "<think>", "<|START_THINKING|>"
    std::string         reasoning_end;    // e.g., "</think>", "<|END_THINKING|>"

    // Content wrapping mode
    enum content_mode_type {
        CONTENT_PLAIN,                   // No content markers
        CONTENT_ALWAYS_WRAPPED,          // <response>...</response> always present
        CONTENT_WRAPPED_WITH_REASONING,  // Content wrapped only when reasoning present
    };

    content_mode_type content_mode = CONTENT_PLAIN;
    std::string       content_start;  // e.g., "<response>", "<|START_RESPONSE|>"
    std::string       content_end;    // e.g., "</response>", "<|END_RESPONSE|>"
};
```

### diff_analysis_result (Analysis Result)

The result of differential analysis contains all extracted markers and format classifications:

```cpp
struct diff_analysis_result {
    // Classification results
    reasoning_mode  reasoning = reasoning_mode::NONE;
    content_mode    content   = content_mode::PLAIN;
    tool_format     tools     = tool_format::NONE;
    argument_format args      = argument_format::JSON;

    // All extracted markers (see marker_registry below)
    marker_registry markers;

    // JSON field names (for JSON-based formats)
    std::string name_field = "name";
    std::string args_field = "arguments";
    std::string id_field;

    // Flags
    bool supports_tools           = false;
    bool supports_parallel_calls  = false;
    bool requires_nonnull_content = false;

    // Preserved tokens for tokenizer
    std::vector<std::string> preserved_tokens;
};
```

### marker_registry (Extracted Markers)

All markers are extracted via differential analysis without hardcoded patterns:

```cpp
struct marker_registry {
    // === Reasoning markers ===
    std::string reasoning_start;  // e.g., "<think>", "[THINK]", "<|START_THINKING|>"
    std::string reasoning_end;    // e.g., "</think>", "[/THINK]", "<|END_THINKING|>"

    // === Content markers ===
    std::string content_start;  // e.g., "<response>", ">>>all\n"
    std::string content_end;    // e.g., "</response>"

    // === Tool section markers ===
    std::string tool_section_start;  // e.g., "<tool_call>", "[TOOL_CALLS]"
    std::string tool_section_end;    // e.g., "</tool_call>", "]"
    std::string per_call_start;      // e.g., "\u2985" (for multi-call templates)
    std::string per_call_end;        // e.g., " \u2985"
    std::string call_separator;      // e.g., ",", "\n"

    // === Function markers ===
    std::string func_name_prefix;  // e.g., "<function=", "\"name\": \""
    std::string func_name_suffix;  // e.g., ">", "\""
    std::string func_close;        // e.g., "</function>"
    std::string args_start;        // e.g., "{", " \u300b"
    std::string args_end;          // e.g., "}", ""

    // === Argument markers (for tagged args format) ===
    std::string arg_name_prefix;   // e.g., "<param=", "<arg_key>"
    std::string arg_name_suffix;   // e.g., ">", "</arg_key>"
    std::string arg_value_prefix;  // e.g., "", "<arg_value>"
    std::string arg_value_suffix;  // e.g., "</param>", "</arg_value>"
    std::string arg_separator;

    // === Special markers ===
    std::string code_block_marker;    // e.g., "Action:" (markdown code block format)
    std::string id_marker;            // e.g., "[CALL_ID]" (bracket-tag format)
    std::string function_namespace;   // e.g., "functions." (prefixed-indexed format)
};
```

## Tool Calling Formats

The auto-parser recognizes three primary tool calling formats. Other formats may be deprecated in future versions.

### JSON_NATIVE

**Structure**: The entire tool call (function name, arguments, and values) is in JSON format. There may be enclosing tags around the tool calling section.

**Characteristics**:
- Function name is a JSON field: `"name": "function_name"`
- Arguments are a JSON object: `"arguments": {"key": "value"}`
- May be wrapped in section markers like `<tool_call>...</tool_call>` or `[TOOL_CALLS]...]`

**Examples**:

Standard OpenAI-style:
```json
<tool_call>
{"name": "get_weather", "arguments": {"location": "Paris", "unit": "celsius"}}
</tool_call>
```

Mistral Nemo with array wrapper:
```json
[TOOL_CALLS]
[{"name": "calculate", "arguments": {"expr": "2+2"}}]
```

Hermes-style with tool_calls wrapper:
```json
<tool_calls>
{"name": "search", "arguments": {"query": "llama.cpp"}}
</tool_calls>
```

**Detection**: `args_start == "{"`, `args_end == "}"`, no function name prefix markers

---

### TAG_WITH_JSON

**Structure**: The function name is outside the JSON structure, typically within quasi-XML markers. Arguments are still provided as a JSON object.

**Characteristics**:
- Function name appears in tag attributes: `<function=function_name>` or `<tool_call name="function_name">`
- Arguments are a JSON object following the tag
- Has closing tags: `</function>` or `</tool_call>`
- Arguments remain valid JSON

**Examples**:

Nemotron-style:
```xml
<TOOLCALL>get_weather{"location": "Paris"}</TOOLCALL>
```

Functionary v3.1:
```xml
<function=get_weather>{"location": "Paris", "unit": "celsius"}</function>
```

ByteDance Seed-OSS:
```xml
<seed:tool_call>
<tool_name>get_weather</tool_name>
<parameters>{"location": "Paris"}</parameters>
</seed:tool_call>
```

MiniMax:
```xml
<minimax:tool_call>
<tool_name>calculate</tool_name>
<arguments>{"expr": "2+2"}</arguments>
</minimax:tool_call>
```

**Detection**: `func_name_prefix` starts with `<`, `args_start == "{"`, arguments are JSON

---

### TAG_WITH_TAGGED

**Structure**: Both the function name AND argument names are in XML-style tags. Argument values may be JSON or unquoted primitives depending on schema type.

**Characteristics**:
- Function name in tag: `<function=name>` or `<invoke=name>`
- Each argument has its own tag: `<param=key>value</param>`
- String values are **unquoted** (raw text content of the tag)
- Non-string values (objects, arrays, numbers, booleans) are still JSON-formatted
- Supports streaming: partial arguments can be parsed incrementally

**Examples**:

Qwen/Hermes XML format:
```xml
<function=get_weather>
<param=location>Paris</param>
<param=unit>celsius</param>
</function>
```

Note how string values (`Paris`, `celsius`) are unquoted inside the tags.

Mixed types example:
```xml
<function=calculate>
<param=expr>2+2</param>
<param=precision>2</param>
<param=options>{"round": true}</param>
</function>
```

Here:
- `expr` and `precision` are strings (unquoted)
- `options` is an object (JSON-formatted inside the tag)

**Detection**: `arg_name_prefix` is non-empty, arguments use tagged format rather than JSON object

---

### Other Formats (To Be Deprecated)

The following formats are currently supported but will likely be deprecated:

| Format | Description | Example |
|--------|-------------|---------|
| `BRACKET_TAG` | Bracket-based markers | `[TOOL_CALLS]func[ARGS]{...}` |
| `PREFIXED_INDEXED` | Namespace prefix with index | `functions.name:0{...}` |
| `RECIPIENT_BASED` | Recipient routing | `>>>recipient\n{content}` |
| `MARKDOWN_BLOCK` | Markdown code blocks | `Action:\n\`\`\`json\n[...]` |

## Analysis Flow

```console
Template
    |
    v
Phase 1: analyze_content_structure()
    |-- detect_reasoning_markers() - compare outputs with reasoning_content vs without
    |-- detect_content_markers() - render with content and detect wrapping
    |-- detect_reasoning_mode() - check if prompt ends with open tag
    |
    v
content_structure
    |
    v
Phase 2: analyze_tool_structure()
    |-- Check minja.supports_tool_calls
    |-- Differential analysis for tool patterns
    |-- Classify function format (JSON vs tagged)
    |-- Classify argument format (JSON vs tagged)
    |
    v
diff_analysis_result
    |
    v
generate_parser(diff_analysis_result)
    |-- build_reasoning_block(diff_analysis_result)
    |-- build_content_block(diff_analysis_result)
    |-- build_tool_section(diff_analysis_result, tools)
    |-- Compose into final parser
    |
    v
common_chat_params (parser, grammar, triggers, preserved_tokens)
```

## Entry Point

The mechanism starts in `common/chat.cpp`, in `common_chat_templates_apply_jinja`:

```cpp
// 1. Analyze the template (two-phase)
auto analysis = differential_analyzer::analyze(tmpl);

// 2. Generate the parser and grammar
auto auto_params = universal_peg_generator::generate_parser(tmpl, params);

// 3. Use if it provides more than basic content handling
if (auto_params.format != COMMON_CHAT_FORMAT_CONTENT_ONLY ||
    !auto_params.parser.empty()) {
    return auto_params;
}
```

## Builder Methods

The unified builder (`common_chat_peg_unified_builder`) provides high-level methods:

- `build_reasoning_block(analysis, reasoning_format, thinking_forced_open)` - Build reasoning parser
- `build_content_block(analysis, reasoning_format)` - Build content parser
- `build_tool_section(analysis, tools, parallel_tool_calls, force_tool_calls)` - Build tool section
- `build_function(analysis, name, schema)` - Build single function parser
- `build_arguments(analysis, schema)` - Build arguments parser

## Key Templates Supported

- **Granite** - `<think></think>` + `<response></response>` with tool calls
- **Nemotron** - JSON tools with `<TOOLCALL>` wrapper
- **Qwen/Hermes** - XML-style `<function=X><param=key>` format (TAG_WITH_TAGGED)
- **Command-R7B** - `<|START_THINKING|>`/`<|START_RESPONSE|>` + `<|START_ACTION|>` tools
- **DeepSeek R1** - Forced thinking + complex tools
- **Mistral Nemo** - `[TOOL_CALLS]` wrapper (JSON_NATIVE)
- **MiniMax** - `<minimax:tool_call>` wrapper with JSON args (TAG_WITH_JSON)
- **GLM-4.6** - `<minimax:tool_call>` + `<tool_call>name\n<arg_key>...<arg_value>...` format
- **Kimi-K2** - `PREFIXED_INDEXED` format with namespace and indices
- **Mistral Small 3.2** - `BRACKET_TAG` format with `[TOOL_CALLS]` markers
- **Functionary v3.2** - `RECIPIENT_BASED` format with `>>>` routing

## Files

| File | Purpose |
|------|---------|
| `common/chat-auto-parser.h` | Data structures and API declarations |
| `common/chat-diff-analyzer.h/cpp` | Differential analysis implementation |
| `common/chat-auto-parser-generator.cpp` | PEG parser generator |
| `common/chat-auto-parser-helpers.h/cpp` | Shared helper functions |
| `common/chat-peg-parser.h/cpp` | Unified builder and mapper classes |
| `common/chat.cpp` | Main entry point and wire-up |

## Algorithm Details

### Phase 1: Content & Reasoning Analysis

#### Reasoning Detection (4 Methods)

**Method 1: Differential Reasoning Content Analysis**

- Render template with `reasoning_content` field present vs absent
- Compare outputs to find markers between reasoning and content
- If only closing tag found, derive opening tag using patterns:
  - XML: `</tag>` → `<tag>`
  - Special tokens: `<|END_X|>` → `<|START_X|>`, `<|/X|>` → `<|X|>`
- Handles various tag formats including XML and special token formats

**Method 2: Enable-Thinking Toggle Analysis**

- Toggle `enable_thinking` context variable between true/false
- Detects differences in generated prompts
- Handles two scenarios:
  - **Normal case**: enable_thinking=true adds reasoning markers
  - **Reverse case**: enable_thinking=false adds empty thinking block (GLM-4.6 style)
- Uses string difference analysis to extract markers
- Validates extracted tags against blacklist of role markers

**Method 3: Prompt Ending Analysis**

- Checks if prompt ends with unclosed reasoning tag
- Looks for trailing tags in prompt with `enable_thinking=true`
- Differentiates between open tags (`<think>`) and close tags (`</think>`)
- Handles blacklisted tags (role markers, system tokens)
- Validates reasoning-like patterns (contains "think", "reason", "thought")

**Method 4: Adjacent Tag Pair Detection**

- Looks for patterns like `<minimax:tool_call></think>`, `<|START_THINKING|><|END_THINKING|>`, `[think][/think]`
- Searches for predefined tag patterns in prompt
- Validates tags are adjacent with only whitespace between
- Supports both simple and complex token formats

#### Content Detection Algorithm

1. **Dual-Mode Rendering**: Render template with content marker in both thinking-enabled and thinking-disabled modes
2. **Pattern Matching**: Search for known content wrapper patterns:
   - `<|START_RESPONSE|>` / `<|END_RESPONSE|>`
   - `<response>` / `</response>`
   - `<output>` / `</output>`
   - `<answer>` / `</answer>`
   - `<|CHATBOT_TOKEN|>` / `<|END_OF_TURN_TOKEN|>`
3. **Mode Classification**:
   - `CONTENT_ALWAYS_WRAPPED`: Found in both thinking modes
   - `CONTENT_WRAPPED_WITH_REASONING`: Found only with thinking enabled
   - `CONTENT_PLAIN`: No wrapping detected

#### Reasoning Mode Detection

- **REASONING_FORCED_OPEN**:
  - **Explicit**: Prompt ends with reasoning start marker (e.g., `<think>`).
  - **Implicit**: reasoning end marker is present but start marker is empty (e.g., `[BEGIN FINAL RESPONSE]`).
- **REASONING_OPTIONAL**: Markers present but not forced.
- **REASONING_NONE**: No markers detected.

### Phase 2: Tool Call Structure Analysis

#### Pure Differential Analysis Algorithm

**Key Principle**: All patterns are extracted through template comparison. The **only heuristic** is detecting JSON vs marker-based structures (via JSON parse attempt). No hardcoded pattern lists.

**Comparison Matrix**:

| Comparison | Purpose | What's Extracted |
|------------|---------|------------------|
| **T1**: No tools vs tools | Tool section markers | `tool_section_start`, `tool_section_end` |
| **T2**: 1 call vs 2 calls | Call separators | `per_call_start`, `call_separator` |
| **T3**: func_alpha vs func_beta | Function boundaries | `func_name_prefix`, `func_name_suffix` |
| **T4**: 1 arg vs 2 args | Argument separator | `arg_separator` |
| **T5**: No args vs args | Args container | `args_start`, `args_end` |
| **A1**: key1 vs key2 | Arg name boundaries | `arg_name_prefix`, `arg_name_suffix` |
| **A2**: value A vs B | Arg value boundaries | `arg_value_prefix`, `arg_value_suffix` |
| **A3**: number vs string | Quoting behavior | Value type handling |

**Structural Extraction Helpers**:

```cpp
// Extract last structural marker from string (finds last <, [, {, or ")
std::string extract_structural_suffix(const std::string & str);

// Extract first structural marker from string (finds first >, ], }, or ")
std::string extract_structural_prefix(const std::string & str);

// The only heuristic: detect if content is valid JSON
bool is_json_based(const std::string & content);
```

**Pattern Extraction Process** (Example - T1: Tool Section Markers):

1. Render template with/without tool calls
2. Compute diff: `calculate_diff_split(output_no_tools, output_with_tools)`
3. Use controlled function name (`func_alpha`) as anchor in `diff.right`
4. Extract structural prefix before function name → `tool_section_start`
5. Extract structural suffix after tool content → `tool_section_end`

**No Pattern Lists**: Unlike the old approach, there are no hardcoded lists like `["<tool_call>", "[TOOL_CALLS]", ...]`. All markers are discovered through differential comparison.

#### Variant Detection Logic

Instead of forcing patterns into enum types, the analyzer detects **variant types** as strings that describe the structural characteristics:

**Variant Types**:

- `"json-native"`: Pure JSON tool calls (Llama, Mistral Nemo)
- `"tagged-json"`: Function name in markers, args in JSON (Functionary v3.1, Nemotron)
- `"tagged-args"`: Full XML-style with tagged arguments (Qwen, Hermes, MiniMax)
- `"bracket-tag"`: Bracket markers (Mistral Small 3.2: `[TOOL_CALLS]func[ARGS]{...}`)
- `"recipient-based"`: Recipient routing (Functionary v3.2: `>>>func_name`)
- `"markdown-block"`: Markdown code blocks (Cohere Command-R Plus)
- `"prefixed-indexed"`: Namespace prefix with indices (Kimi-K2: `functions.name:0`)

**Detection Strategy** (from most to least distinctive):

```cpp
void detect_tool_variant(diff_analysis_result & result) {
    // 1. Check for unique markers (most distinctive)
    if (!result.markers.id_marker.empty())
        → "bracket-tag"

    if (markers contain ">>>")
        → "recipient-based"

    if (code_block_marker present)
        → "markdown-block"

    if (function_namespace or suffix contains ':')
        → "prefixed-indexed"

    // 2. Check argument structure (JSON variants)
    if (arg_name_prefix starts with '<')
        → "tagged-args"

    if (func_name_prefix starts with '<')
        → "tagged-json"

    // 3. Default
    → "json-native"
}
```

#### Compositional Parser Building

The analyzer builds separate, composable parsers for each component:

**Reasoning Parser**:

- Built from `reasoning_start` and `reasoning_end` markers
- Supports tag-based, delimiter, and forced-open modes

**Content Parser**:

- Built from `content_start` and `content_end` markers
- Supports plain, always-wrapped, and conditionally-wrapped modes

**Tool Parser** (variant-specific):

- Built based on `variant_type` detection
- Each variant has its own builder that uses the extracted markers
- No enum forcing - structure preserved as discovered

**Final Composition**:

```cpp
sequence({
    reasoning_parser,
    space(),
    content_parser,
    space(),
    tool_parser,
    end()
})
```

### Generator Algorithms

#### Unified Parser Building

**Composition Strategy**:

```cpp
// Standard format
sequence({ reasoning, space(), content, space(), tools, space(), content, end() })

// With section markers
sequence({ reasoning, space(), content_until(section_start), space(), tools, space(), content, end() })

// Forced thinking handling
optional(reasoning) when thinking_forced_open && tools present
```

**Trigger Word Detection**:

- Uses `tool_section_start` as primary trigger
- Falls back to `function_prefix` or `per_call_start`
- Raw JSON uses regex pattern trigger

**Lazy Grammar Optimization**:

- Enabled by default for performance
- Disabled when thinking forced open
- Disabled when no clear trigger word exists

## Testing & Debugging

### Comprehensive Test Coverage

The test suite covers:

**Reasoning Models**:

- Qwen-QwQ-32B (forced-open thinking)
- DeepSeek R1 variants (reasoning only)
- IBM Granite (reasoning + tools)
- ByteDance Seed-OSS (custom reasoning tags)
- Ministral-3-14B-Reasoning
- llama-cpp-deepseek-r1

**Tool Call Formats**:

- JSON_NATIVE: Llama 3.x, Mistral Nemo, Hermes, MiMo-VL
- TAG_WITH_JSON: Nemotron, Qwen3-Coder, MiniMax
- TAG_WITH_TAGGED: Qwen, Hermes (XML), ByteDance Seed-OSS
- BRACKET_TAG: Mistral Small 3.2, Devstral
- PREFIXED_INDEXED: Kimi-K2 variants
- RECIPIENT_BASED: Functionary v3.2
- MARKDOWN_BLOCK: Cohere Command-R Plus

**Edge Cases**:

- Streaming/partial parsing
- Empty content with tools
- Parallel tool calls
- Forced thinking mode
- Multi-byte Unicode markers
- Null content handling
- Multi-line code in tool arguments
- Custom reasoning tags (ByteDance Seed-OSS)

### Debug Tools

**Template Debugger**: `tests/debug-template-parser.cpp`

- Usage: `./bin/debug-template-parser path/to/template.jinja`
- Shows detected format, markers, generated parser, and GBNF grammar

**Debug Logging**: Enable with `LLAMA_LOG_VERBOSITY=2`

- Shows detailed analysis steps
- Displays pattern extraction results
- Lists generated parser structure

**PEG Test Builder**: Fluent API for creating test cases

```cpp
auto tst = peg_tester("template.jinja");
tst.test("input")
   .reasoning_format(COMMON_REASONING_FORMAT_AUTO)
   .tools({tool})
   .expect(expected_message)
   .run();
```

## Adding Support for New Templates

To support a new template format:

1. **If it follows standard patterns** - The auto-parser should detect it automatically using the three main formats (JSON_NATIVE, TAG_WITH_JSON, TAG_WITH_TAGGED)
2. **If it has unique markers** - Add differential analysis patterns in:
   - `compare_reasoning_presence()` for reasoning tags
   - `compare_content_values()` for content wrappers
   - `extract_tool_section()` for tool call patterns
3. **If it needs special handling** - Add a dedicated handler in `chat.cpp` before the auto-parser block

## Edge Cases and Quirks

1. **Forced Thinking**: If `enable_thinking` is true but the model has already started a thought block (e.g., ended the prompt with `<think>`), the parser enters "forced thinking" mode where it immediately expects reasoning content.
2. **Ambiguous Content**: Templates that mix content and tool calls without clear delimiters can be tricky. The analyzer tries to find "common" start/end patterns across multiple examples to be robust.
3. **Double Wrapping**: Some templates (e.g., Functionary) use the same string for both the tool section start and the function prefix (e.g., `<function=`). The analyzer detects this overlap and prevents double-wrapping in the generated parser.
4. **Null Content Rendering**: Some templates render `null` content as Python "None" string. The analyzer detects this and patches content to empty string.
5. **Multi-byte Unicode Markers**: Some templates use special Unicode characters in markers that require careful handling in GBNF generation.

## State of the Autoparser (Jan 2026)

As of January 2026, the unified auto-parser successfully handles major template families including DeepSeek V3/R1, Llama 3.x (native JSON), GLM-4/4.6, and standard XML/JSON formats. It also supports Functionary v3.1/v3.2, Mistral variants, and specialized formats like Kimi-K2's prefixed-indexed structure.

### Tested Templates

The following templates have active tests in `tests/test-chat.cpp`:

| Template | Format | Notes |
|----------|--------|-------|
| DeepSeek V3.1 | `JSON_NATIVE` | Forced thinking mode |
| DeepSeek R1 Distill (Llama/Qwen) | Reasoning only | Forced-open thinking |
| llama-cpp-deepseek-r1 | Reasoning only | Forced-open thinking |
| GLM-4.6 | `TAGGED` | `<tool_call>name\n<arg_key>...<arg_value>...` format |
| Kimi-K2 / Kimi-K2-Instruct / Kimi-K2-Thinking | `PREFIXED_INDEXED` | `functions.name:0` with special markers |
| Apertus-8B-Instruct | `NAME_AS_KEY` | `{"function_name": {...}}` format |
| MiniMax-M2 | `TAG_WITH_JSON` | XML invoke with parameter tags |
| NVIDIA-Nemotron-Nano-v2 | `JSON_NATIVE` | `<TOOLCALL>` wrapper (nested) |
| Mistral-Nemo-Instruct-2407 | `JSON_NATIVE` | `[TOOL_CALLS]` wrapper with id field |
| Functionary v3.1 | `TAG_WITH_JSON` | `<function=X>` non-nested format |
| Functionary v3.2 | `RECIPIENT_BASED` | `>>>` recipient delimiter format |
| MiMo-VL / Hermes 3 / Qwen 2.5 | `JSON_NATIVE` | `<tool_call>` wrapper |
| Apriel 1.5 | `JSON_NATIVE` | `<tool_calls>` wrapper with JSON array |
| Apriel 1.6 Thinker | Reasoning only | Implicit reasoning start |
| Cohere Command-R7B | `JSON_NATIVE` | START_RESPONSE/ACTION/THINKING markers |
| Mistral Small 3.2 | `BRACKET_TAG` | `[TOOL_CALLS]func[ARGS]{...}` with ID |
| Devstral | `BRACKET_TAG` | `[TOOL_CALLS]func[ARGS]{...}` without ID |
| Ministral-3-14B-Reasoning | Custom reasoning | `[THINK]...[/THINK]` tags |
| IBM Granite | `JSON_NATIVE` | `<think></think>` + `<response></response>` |
| ByteDance Seed-OSS | `TAG_WITH_TAGGED` | Custom `<seed:think>` and `<seed:tool_call>` tags |
| Qwen3-Coder | `TAG_WITH_TAGGED` | XML-style tool format |
| Cohere Command-R Plus | `MARKDOWN_BLOCK` | `Action:\n`\`\`\`json\n[...]\n`\`\`` format |

### Currently Unsupported Templates

| Template Family | Model / Variant | Issue Description |
|-----------------|-----------------|-------------------|
| **OpenAI** | `GPT-OSS` | Complex channel markers need new format |

### Templates Without Tool Support

Some templates genuinely don't support tool calls (this is not a detection bug):

- **Phi 3.5 Mini** - The official template has no tool handling. Use Phi-4-mini-instruct for function calling, or community fine-tuned versions.
- **Google Gemma 2 2B** - Pure instruction-following model without tool capabilities.

### TODO / Roadmap

- [ ] **Fix OpenAI GPT-OSS**: Add handling for channel marker structure.
- [x] **~~Fix Cohere Command-R Plus~~**: Added `MARKDOWN_BLOCK` format for `Action:\n`\`\`\`json` structure.

### Recent Additions (Dec 2025 - Jan 2026)

- **RECIPIENT_BASED**: Support for Functionary v3.2's `>>>` recipient delimiter format
- **BRACKET_TAG**: Support for Mistral Small 3.2 and Devstral's `[TOOL_CALLS]...` format
- **Enhanced Content Detection**: Better handling of custom reasoning tags and content wrappers
- **Improved Streaming Support**: Better handling of partial parsing for all supported formats
- **Custom Tag Support**: Support for non-standard reasoning tags like `<seed:think>` (ByteDance)
- **Multi-line Tool Arguments**: Better parsing of complex tool arguments with code blocks
- **MARKDOWN_BLOCK**: Support for Cohere Command-R Plus markdown code block format
- **Implicit Reasoning Support**: Support for templates where reasoning starts implicitly without a start marker.
- **Pure Differential Refactoring (Jan 2026)**: Complete refactoring to eliminate hardcoded patterns:
  - Removed all hardcoded pattern lists (previously had `["<tool_call>", "[TOOL_CALLS]", ...]`)
  - Added structural extraction helpers (`extract_structural_suffix`, `extract_structural_prefix`)
  - Replaced enum-based classification with string-based variant types
  - Only remaining heuristic: JSON detection via parse attempt
  - All markers now discovered through differential template comparison
- **Three Primary Tool Formats**: Consolidated tool calling formats to JSON_NATIVE, TAG_WITH_JSON, and TAG_WITH_TAGGED for clarity and maintainability

The auto-parser now successfully handles 25+ different template formats across reasoning-only, tool-calling, and hybrid models, with comprehensive test coverage ensuring robust parsing across streaming and non-streaming scenarios.