llama.cpp/common
Tarek Dakhran 2e17f6a931
mtmd : chat : Fix extra \n between text and media marker
Thanks to @tugot17 for detecting and reporting the issue.

For vision models (e.g. LFM2.5-VL-1.6B and Qwen/Qwen3-VL-4B-Instruct) `llama-mtmd-cli` produces identical output to HF implementation.

However `llama-server` doesn't. I traced it down to extra newline
inserted after `<__media__>`.

This happens in `to_json_oaicompat`, that treats media markers as text
and joins all parts with `\n` separator.

PR introduces new type `media_marker` and uses it for media markers.
Extra logic is added to prevent insertion of newlines before and after
media markers.

With this change number of input tokens is identical to HF
implementation and as a result the output is also identical.

I explored other ways to address the issue
* remove completely `\n` between text parts in `to_json_oaicompat`
* merge text messages in server-common.cpp before sending them to `to_json_oaicompat`

Please propose alternative ways of fixing this issue.
2026-02-13 16:10:48 +01:00
..
jinja chat: fix case where template accepts type content only (#19419) 2026-02-09 22:14:12 +01:00
CMakeLists.txt spec : add ngram-mod (#19164) 2026-01-30 18:21:48 +02:00
arg.cpp args : add -kvu to llama-parallel (#19577) 2026-02-12 21:52:41 +02:00
arg.h vendor : update cpp-httplib to 0.30.0 (#18660) 2026-01-08 13:53:54 +01:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) 2025-06-13 10:38:52 +02:00
chat-parser-xml-toolcall.cpp Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser-xml-toolcall.h Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser.cpp server : support preserving reasoning_content in assistant message (#18994) 2026-01-22 21:30:06 +01:00
chat-parser.h cli : fix reasoning responses in CLI (#18961) 2026-01-20 18:23:25 +01:00
chat-peg-parser.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
chat-peg-parser.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
chat.cpp mtmd : chat : Fix extra \n between text and media marker 2026-02-13 16:10:48 +01:00
chat.h chat: fix case where template accepts type content only (#19419) 2026-02-09 22:14:12 +01:00
common.cpp common : replace deprecated codecvt using parse_utf8_codepoint (#19517) 2026-02-12 07:27:52 +01:00
common.h common : remove unused token util functions (#19506) 2026-02-11 17:41:35 +01:00
console.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
console.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
debug.cpp debug: make common_debug_print_tensor readable (#19331) 2026-02-04 17:55:31 +01:00
debug.h Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914) 2026-01-14 20:29:35 +01:00
download.cpp common : improve download error reporting (#19491) 2026-02-11 09:27:55 +01:00
download.h preset: allow named remote preset (#18728) 2026-01-10 15:12:29 +01:00
http.h common : clarify HTTPS build options in error message (#19103) 2026-01-27 06:16:00 +01:00
json-partial.cpp common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
json-partial.h cli : fix reasoning responses in CLI (#18961) 2026-01-20 18:23:25 +01:00
json-schema-to-grammar.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
json-schema-to-grammar.h common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
llguidance.cpp sampling : add support for backend sampling (#17004) 2026-01-04 22:22:16 +02:00
log.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
log.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
ngram-cache.cpp spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
ngram-cache.h spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
ngram-map.cpp llama : correct typos 'occured' and 'occurences' (#19414) 2026-02-11 07:05:31 +01:00
ngram-map.h llama : correct typos 'occured' and 'occurences' (#19414) 2026-02-11 07:05:31 +01:00
ngram-mod.cpp spec : add ngram-mod (#19164) 2026-01-30 18:21:48 +02:00
ngram-mod.h ngram-mod : fix build [no ci] (#19216) 2026-01-30 21:27:27 +02:00
peg-parser.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
peg-parser.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
preset.cpp preset: allow named remote preset (#18728) 2026-01-10 15:12:29 +01:00
preset.h common: support remote preset (#18520) 2026-01-08 22:35:40 +01:00
regex-partial.cpp common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342) 2026-01-03 16:02:43 -06:00
regex-partial.h `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
sampling.cpp llama : add adaptive-p sampler (#17927) 2026-01-15 19:16:29 +02:00
sampling.h sampling : add support for backend sampling (#17004) 2026-01-04 22:22:16 +02:00
speculative.cpp spec : remove check rate (#19377) 2026-02-09 15:30:50 +02:00
speculative.h common : add common_speculative_is_compat() (#19270) 2026-02-06 16:47:22 +02:00
unicode.cpp common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
unicode.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00