llama.cpp/tests
Jesse Posner 3dadc88b58
common : fix Step-3.5-Flash format detection and thinking support (#19635)
* common : fix Step-3.5-Flash format detection and thinking support

Step-3.5-Flash uses the same XML-style tool call format as Qwen3-Coder
(<tool_call><function=...><parameter=...>) but its Jinja template lacks
the bare <function> and plural <parameters> markers that the detection
logic previously required. This caused it to fall through to Hermes 2
Pro, which doesn't call func_args_not_string(), so arguments stayed as
JSON strings and templates using arguments|items crashed.

Additionally, the Qwen3-Coder-XML format handler had no thinking support.
Models like Step-3.5-Flash that unconditionally emit <think> in their
generation prompt need the same thinking_forced_open handling that
Nemotron v3 and Hermes 2 Pro already have, otherwise reasoning_content
is never separated from content in API responses.

Changes:
- Relax Qwen3-Coder XML detection to only require the 3 shared markers
- Tighten Nemotron v3 branch to also require bare <function> and plural
  <parameters>, preventing Step-3.5-Flash from being misrouted via <think>
- Add thinking_forced_open support to Qwen3-Coder-XML init function
- Add <think>/</think> to preserved tokens
- Fix build_grammar_xml_tool_call to handle thinking_forced_open in the
  grammar root rule, allowing </think> before tool calls
- Add Step-3.5-Flash chat template and format detection test

Builds on: https://github.com/ggml-org/llama.cpp/pull/19283

* chat : route Step-3.5-Flash to Nemotron v3 PEG parser, add tests

Step-3.5-Flash uses the same XML tool call format as Qwen3-Coder and
Nemotron 3 Nano (<tool_call>/<function=...>/<parameter=...>) but with
unconditional <think> output. Route it to the Nemotron v3 PEG parser
for streaming and schema-aware parameter parsing.

Detection: templates with <think> + XML tool tags use Nemotron v3 PEG
parser; templates without <think> (Qwen3-Coder) use GBNF grammar.

Tests cover: basic messages, tool calls with/without thinking content,
parallel tool calls, code string parameters, optional </parameter>
closing tags, and JSON schema response format.

* chat : remove dead thinking code from qwen3_coder_xml

Remove thinking handling code that became unreachable after routing
Step-3.5-Flash to the Nemotron v3 PEG parser. Qwen3-Coder has no
<think> in its template, so the thinking_forced_open logic, preserved
tokens, and grammar prefix were dead paths.
2026-02-19 22:40:52 +01:00
..
peg-parser common : implement new jinja template engine (#18462) 2026-01-16 11:22:06 +01:00
.gitignore common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
CMakeLists.txt cmake : add variable to skip installing tests (#19370) 2026-02-09 07:12:02 +01:00
get-model.cpp ci : add model tests + script wrapper (#4586) 2024-01-26 14:18:00 +02:00
get-model.h ci : add model tests + script wrapper (#4586) 2024-01-26 14:18:00 +02:00
run-json-schema-to-grammar.mjs llama : move end-user examples to tools directory (#13249) 2025-05-02 20:27:13 +02:00
test-alloc.cpp ggml : fix graph reallocation with multiple chunks (#16396) 2025-10-03 13:49:08 +02:00
test-arg-parser.cpp ci, tests : use cmake to download models and remove libcurl dependency (#18791) 2026-01-14 07:46:27 +01:00
test-autorelease.cpp docs : Minor cleanups (#19252) 2026-02-02 08:38:55 +02:00
test-backend-ops.cpp ggml : avoid UB in gemm ukernel (#19642) 2026-02-15 14:56:35 +02:00
test-backend-sampler.cpp tests : refactor test-backend-sampler (#18753) 2026-01-11 17:31:03 +02:00
test-barrier.cpp Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748) 2025-12-10 12:32:23 -08:00
test-c.c ggml : remove kompute backend (#14501) 2025-07-03 07:48:32 +03:00
test-chat-parser.cpp cli : fix reasoning responses in CLI (#18961) 2026-01-20 18:23:25 +01:00
test-chat-peg-parser.cpp cli : fix reasoning responses in CLI (#18961) 2026-01-20 18:23:25 +01:00
test-chat-template.cpp jinja : do not pass empty tools and add some none filters (#19176) 2026-01-29 14:06:54 +01:00
test-chat.cpp common : fix Step-3.5-Flash format detection and thinking support (#19635) 2026-02-19 22:40:52 +01:00
test-double-float.cpp ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
test-gbnf-validator.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-gguf.cpp GGUF: check that tensor size is representable (#19072) 2026-01-24 21:57:51 +01:00
test-grammar-integration.cpp llama : add token matching support to llama-grammar (#17816) 2025-12-09 00:32:57 -06:00
test-grammar-llguidance.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-grammar-parser.cpp llama : add token matching support to llama-grammar (#17816) 2025-12-09 00:32:57 -06:00
test-jinja.cpp Add Jinja support for "indent" string filter (#19529) 2026-02-19 00:25:52 +01:00
test-json-partial.cpp common : handle unicode during partial json parsing (#16526) 2025-10-12 16:18:47 +03:00
test-json-schema-to-grammar.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
test-llama-grammar.cpp llama : add token matching support to llama-grammar (#17816) 2025-12-09 00:32:57 -06:00
test-log.cpp common : use common_ prefix for common library functions (#9805) 2024-10-10 22:57:42 +02:00
test-lora-conversion-inference.sh cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
test-model-load-cancel.cpp llama : update llama_model API names (#11063) 2025-01-06 10:55:18 +02:00
test-mtmd-c-api.c mtmd : add C public API (#13184) 2025-05-04 23:43:42 +02:00
test-opt.cpp tests : fix test-opt with GGML_BACKEND_DL (#15599) 2025-08-26 22:14:38 +02:00
test-peg-parser.cpp common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
test-quantize-fns.cpp tests : fix test-quantize-fns to init the CPU backend (#12306) 2025-03-10 14:07:15 +02:00
test-quantize-perf.cpp ci: run the x64 and arm ci on the github machines instead (#16183) 2025-09-25 08:06:06 +03:00
test-quantize-stats.cpp server: introduce API for serving / loading / unloading multiple models (#17470) 2025-12-01 19:41:04 +01:00
test-regex-partial.cpp common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342) 2026-01-03 16:02:43 -06:00
test-rope.cpp ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805) 2025-11-11 13:33:24 +02:00
test-sampling.cpp sampling : optimize samplers by reusing bucket sort (#15665) 2025-08-31 20:41:02 +03:00
test-state-restore-fragmented.cpp kv-cache: Fix state restore fragmented cache (#17982) 2025-12-15 19:28:35 +02:00
test-thread-safety.cpp server : support unified cache across slots (#16736) 2025-11-02 18:14:04 +02:00
test-tokenizer-0.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-tokenizer-0.py py : logging and flake8 suppression refactoring (#7081) 2024-05-05 08:07:48 +03:00
test-tokenizer-0.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
test-tokenizer-1-bpe.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-tokenizer-1-spm.cpp tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
test-tokenizer-random.py requirements : update transformers/torch for Embedding Gemma (#15828) 2025-09-09 06:06:52 +02:00
test-tokenizers-repo.sh devops: add s390x & ppc64le CI (#15925) 2025-09-27 02:03:33 +08:00
testing.h common : implement new jinja template engine (#18462) 2026-01-16 11:22:06 +01:00