llama.cpp

History

Tarek Dakhran c5897995a7 mtmd : chat : Fix extra \n between text and media marker (#19595 ) * mtmd : chat : Fix extra \n between text and media marker Thanks to @tugot17 for detecting and reporting the issue. For vision models (e.g. LFM2.5-VL-1.6B and Qwen/Qwen3-VL-4B-Instruct) `llama-mtmd-cli` produces identical output to HF implementation. However `llama-server` doesn't. I traced it down to extra newline inserted after `<__media__>`. This happens in `to_json_oaicompat`, that treats media markers as text and joins all parts with `\n` separator. PR introduces new type `media_marker` and uses it for media markers. Extra logic is added to prevent insertion of newlines before and after media markers. With this change number of input tokens is identical to HF implementation and as a result the output is also identical. I explored other ways to address the issue * remove completely `\n` between text parts in `to_json_oaicompat` * merge text messages in server-common.cpp before sending them to `to_json_oaicompat` Please propose alternative ways of fixing this issue. * Refactor to use explicite per type ifs * Update common/chat.cpp Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com> * Update common_chat_templates_apply_legacy --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>		2026-02-19 12:18:57 +01:00
..
batched-bench	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
cli	support --verbose-prompt (#19576 )	2026-02-13 12:49:10 +01:00
completion	completion : simplify batch (embd) processing (#19286 )	2026-02-04 05:43:28 +01:00
cvector-generator	docs : Minor cleanups (#19252 )	2026-02-02 08:38:55 +02:00
export-lora	docs : Minor cleanups (#19252 )	2026-02-02 08:38:55 +02:00
fit-params	llama-fit-params: keep explicit --ctx-size 0 (#19070 )	2026-01-24 22:13:08 +01:00
gguf-split	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
imatrix	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
llama-bench	Setting mmap and direct_io to false as default in llama-bench.cpp (#18841 )	2026-01-16 09:46:51 +01:00
mtmd	model: support GLM-OCR (#19677 )	2026-02-18 17:51:40 +01:00
perplexity	perplexity: add proper batching (#19661 )	2026-02-16 18:44:44 +02:00
quantize	llama-quantize : cleanup `--help` output (#19317 )	2026-02-08 09:22:38 +02:00
rpc	NetBSD build support (#19589 )	2026-02-14 09:47:01 +01:00
server	mtmd : chat : Fix extra \n between text and media marker (#19595 )	2026-02-19 12:18:57 +01:00
tokenize	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
tts	model : fix wavtokenizer embedding notions (#19479 )	2026-02-11 07:52:20 +02:00
CMakeLists.txt	cmake: only build cli when server is enabled (#18670 )	2026-01-09 16:43:26 +01:00