Final touches

2026-03-11 21:27:43 +01:00 · 2026-03-11 21:27:43 +01:00 · c75925b228
parent 2249a09f12
commit c75925b228
2 changed files with 29 additions and 24 deletions
--- a/common/chat-auto-parser-generator.cpp
+++ b/common/chat-auto-parser-generator.cpp
@ -45,16 +45,15 @@ common_chat_params peg_generator::generate_parser(const common_chat_template &
    data.format           = COMMON_CHAT_FORMAT_PEG_NATIVE;
    data.preserved_tokens = autoparser.preserved_tokens;

-    // Extract reasoning prefill from the end of the rendered prompt.
-    // If the template added reasoning markers (e.g. <think> or <think></think>) at the end,
-    // store them so they can be prepended to model output before parsing.
+    // Extract reasoning prefill and detect template artifact start markers.
+    // See docs/autoparser.md "Reasoning Prefill" for details.
+    bool clear_reasoning_start = false;
    if (inputs.reasoning_format != COMMON_REASONING_FORMAT_NONE &&
        autoparser.reasoning.mode != reasoning_mode::NONE &&
        !autoparser.reasoning.end.empty()) {
        const auto & r_start = autoparser.reasoning.start;
        const auto & r_end   = autoparser.reasoning.end;

-        // Helper to trim trailing whitespace from a string
        auto rtrim = [](std::string s) {
            while (!s.empty() && (s.back() == ' ' || s.back() == '\n' ||
                                  s.back() == '\r' || s.back() == '\t')) {
@ -63,41 +62,41 @@ common_chat_params peg_generator::generate_parser(const common_chat_template &
            return s;
        };

-        // Trim both the prompt and markers for suffix matching (markers may have trailing \n)
        auto prompt_trimmed  = rtrim(data.prompt);
        auto r_end_trimmed   = rtrim(r_end);
        auto r_start_trimmed = rtrim(r_start);

        if (!r_start_trimmed.empty()) {
-            // Check for start+end at end of prompt (e.g. <think></think>)
            if (string_ends_with(prompt_trimmed, r_end_trimmed)) {
                auto before_end = rtrim(prompt_trimmed.substr(0, prompt_trimmed.size() - r_end_trimmed.size()));
                if (string_ends_with(before_end, r_start_trimmed)) {
-                    // Prompt ends with start + end markers (reasoning closed).
-                    // Use the canonical markers from the analyzer to ensure whitespace
-                    // (e.g. trailing \n in </think>\n) is preserved, even if the template
-                    // rendered them without intermediate whitespace.
+                    // Start+end at prompt end — use canonical markers to preserve whitespace.
                    data.reasoning_prefill = r_start + r_end;
                }
            }
-            // Check for just start at end of prompt (e.g. <think>\n)
            if (data.reasoning_prefill.empty() && string_ends_with(prompt_trimmed, r_start_trimmed)) {
-                // Extract from the original prompt to preserve trailing whitespace
                auto start_pos = prompt_trimmed.size() - r_start_trimmed.size();
                data.reasoning_prefill = data.prompt.substr(start_pos);
            }
+            // Template artifact detection: start marker in prompt but not at end.
+            if (data.reasoning_prefill.empty()) {
+                auto suffix_len = std::min(data.prompt.size(), (size_t) 500);
+                auto suffix     = data.prompt.substr(data.prompt.size() - suffix_len);
+                if (suffix.find(r_start_trimmed) != std::string::npos) {
+                    clear_reasoning_start = true;
+                }
+            }
        }
    }

-    fprintf(stderr, "DEBUG reasoning_prefill: '%s' (start='%s', end='%s', mode=%d, reasoning_format=%d)\n",
-            data.reasoning_prefill.c_str(),
-            autoparser.reasoning.start.c_str(),
-            autoparser.reasoning.end.c_str(),
-            (int) autoparser.reasoning.mode,
-            (int) inputs.reasoning_format);
-
-    // Build the parser using the analysis results.
-    common_peg_arena parser = autoparser.build_parser(inputs);
+    common_peg_arena parser;
+    if (clear_reasoning_start) {
+        struct autoparser modified = autoparser;
+        modified.reasoning.start.clear();
+        parser = modified.build_parser(inputs);
+    } else {
+        parser = autoparser.build_parser(inputs);
+    }
    data.parser = parser.save();

    // Build grammar if tools are present
--- a/docs/autoparser.md
+++ b/docs/autoparser.md
@ -50,7 +50,13 @@ All structs are defined in [common/chat-auto-parser.h](common/chat-auto-parser.h
 | `TAG_BASED`     | Tag-based: `<think>...</think>` (start can be empty for delimiter-style formats)  |
 | `TOOLS_ONLY`    | Reasoning only appears in tool call responses, not plain content                  |

-**Reasoning Prefill**: When a template adds reasoning markers (e.g., `<think>` or `<think></think>`) at the end of the prompt, these are extracted as `reasoning_prefill` and prepended to the model output before parsing. This allows the parser to always use an optional TAG_BASED pattern while correctly handling templates that force thinking mode open or closed. Whitespace-only reasoning content (from `<think></think>` prefill) is automatically discarded.
+**Reasoning Prefill**: Extracted in `generate_parser()` by inspecting the rendered prompt suffix. Three cases:
+
+1. **Start+end at prompt end** (e.g. `<think></think>`): prefill = canonical `start + end` markers (preserving the analyzer's whitespace, e.g. trailing `\n`). The parser sees reasoning as opened and immediately closed.
+2. **Just start at prompt end** (e.g. `<think>\n`): prefill = extracted from the prompt to preserve trailing whitespace. The parser sees reasoning as already opened.
+3. **Start marker in prompt suffix but not at end** (e.g. Apriel's `<|begin_assistant|>` followed by template boilerplate): the start marker is a template artifact falsely detected by the diff analyzer. It is cleared from the parser so reasoning uses delimiter-style (empty start). The distinction from case 2 vs a genuinely model-generated start marker (e.g. `<think>` for Granite) is whether the marker appears in the prompt suffix at all.
+
+The prefill is prepended to model output before PEG parsing, fed to the grammar sampler via `llama_sampler_accept`, and used to determine the reasoning budget sampler's initial state (COUNTING if prefill starts with the reasoning start tokens, IDLE otherwise).

 **`content_mode`**: How the template wraps assistant content.

@ -361,7 +367,7 @@ Each analyzer struct (`analyze_reasoning`, `analyze_content`, `analyze_tools`) i
 | `TAG_BASED` or `TOOLS_ONLY` (non-empty start) | `optional(start + reasoning(until(end)) + end)`          |
 | `TAG_BASED` or `TOOLS_ONLY` (empty start)     | `optional(reasoning(until(end)) + end)` — delimiter-style|

-Note: Templates that add reasoning markers to the prompt (e.g., `<think>`) have these extracted as `reasoning_prefill` and prepended to model output before parsing. The parser always uses the optional TAG_BASED pattern.
+Note: The start marker may be empty either because the analyzer detected delimiter-style reasoning, or because `generate_parser()` cleared a template artifact start marker (see Reasoning Prefill above). The reasoning prefill is prepended to model output before parsing.

 #### Content Parser (`analyze_content::build_parser`)

@ -517,7 +523,7 @@ To support a new template format:

 ## Edge Cases and Quirks

-1. **Reasoning Prefill**: When `enable_thinking=true` and the model prompt ends with reasoning markers (e.g., `<think>` or `<think></think>`), these are extracted as `reasoning_prefill` and prepended to model output before parsing. The parser always uses optional TAG_BASED reasoning, so it handles both thinking and non-thinking outputs dynamically. Whitespace-only reasoning content (from closed prefill like `<think></think>`) is discarded.
+1. **Reasoning Prefill**: See the `reasoning_mode` enum section above for the full description. Key detail: template artifact detection (case 3) checks the last 500 characters of the rendered prompt for the start marker. If found but not at the very end, the start marker is cleared from the parser.
 2. **Per-Call vs Per-Section Markers**: Some templates wrap each tool call individually (`per_call_start/end`); others wrap the entire section (`section_start/end`). T2 (`check_per_call_markers()`) disambiguates by checking if the second call in a two-call output starts with the section marker.
 3. **Python Dict Format**: The Seed template family uses single-quoted JSON (`'key': 'value'`). The `uses_python_dicts` flag causes the PEG builder to register a flexible `json-string` rule accepting both quote styles before any JSON rules are built.
 4. **Tag Boundary Fixing**: `calculate_diff_split()` iteratively adjusts prefix/suffix boundaries to avoid splitting `<tag>` or `[marker]` tokens, ensuring clean extraction.