llama.cpp

History

Jesse Posner db38820013 common : fix Step-3.5-Flash format detection and thinking support Step-3.5-Flash uses the same XML-style tool call format as Qwen3-Coder (<tool_call><function=...><parameter=...>) but its Jinja template lacks the bare <function> and plural <parameters> markers that the detection logic previously required. This caused it to fall through to Hermes 2 Pro, which doesn't call func_args_not_string(), so arguments stayed as JSON strings and templates using arguments\|items crashed. Additionally, the Qwen3-Coder-XML format handler had no thinking support. Models like Step-3.5-Flash that unconditionally emit <think> in their generation prompt need the same thinking_forced_open handling that Nemotron v3 and Hermes 2 Pro already have, otherwise reasoning_content is never separated from content in API responses. Changes: - Relax Qwen3-Coder XML detection to only require the 3 shared markers - Tighten Nemotron v3 branch to also require bare <function> and plural <parameters>, preventing Step-3.5-Flash from being misrouted via <think> - Add thinking_forced_open support to Qwen3-Coder-XML init function - Add <think>/</think> to preserved tokens - Fix build_grammar_xml_tool_call to handle thinking_forced_open in the grammar root rule, allowing </think> before tool calls - Add Step-3.5-Flash chat template and format detection test Builds on: https://github.com/ggml-org/llama.cpp/pull/19283		2026-02-15 23:11:14 -08:00
..
templates	common : fix Step-3.5-Flash format detection and thinking support	2026-02-15 23:11:14 -08:00
.editorconfig	gguf : new file format with flexible meta data (beta) (#2398 )	2023-08-21 23:07:43 +03:00
ggml-vocab-aquila.gguf	Work on the BPE tokenizer (#3252 )	2023-10-03 09:16:26 +02:00
ggml-vocab-baichuan.gguf	Add more tokenizer tests (#3742 )	2023-10-24 09:17:17 +02:00
ggml-vocab-bert-bge.gguf	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
ggml-vocab-bert-bge.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-bert-bge.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-command-r.gguf	command-r : add BPE pre-tokenization (#7063 )	2024-05-05 08:19:30 +03:00
ggml-vocab-command-r.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-command-r.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-deepseek-coder.gguf	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
ggml-vocab-deepseek-coder.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-deepseek-coder.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-deepseek-llm.gguf	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
ggml-vocab-deepseek-llm.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-deepseek-llm.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-falcon.gguf	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
ggml-vocab-falcon.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-falcon.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-gpt-2.gguf	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
ggml-vocab-gpt-2.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-gpt-2.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-gpt-neox.gguf	Add more tokenizer tests (#3742 )	2023-10-24 09:17:17 +02:00
ggml-vocab-llama-bpe.gguf	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
ggml-vocab-llama-bpe.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-llama-bpe.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-llama-spm.gguf	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
ggml-vocab-llama-spm.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-llama-spm.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-mpt.gguf	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
ggml-vocab-mpt.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-mpt.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-nomic-bert-moe.gguf	tests : improve UGM tokenizer test coverage (#13773 )	2025-05-25 16:22:29 +02:00
ggml-vocab-phi-3.gguf	Per token attributes (#7685 )	2024-06-04 09:17:17 +02:00
ggml-vocab-phi-3.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-phi-3.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-qwen2.gguf	llama : add BPE pre-tokenization for Qwen2 (#7114 )	2024-05-08 15:06:43 +03:00
ggml-vocab-qwen2.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-qwen2.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-refact.gguf	tests : add test-tokenizer-0.sh + fix some tokenizers (#7036 )	2024-05-04 08:32:32 +03:00
ggml-vocab-refact.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-refact.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-starcoder.gguf	llama : fix BPE pre-tokenization (#6920 )	2024-04-29 16:58:41 +03:00
ggml-vocab-starcoder.gguf.inp	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00
ggml-vocab-starcoder.gguf.out	convert : allow partial update to the chkhsh pre-tokenizer list (#13847 )	2025-05-30 12:24:37 +02:00