llama.cpp

History

Tarek Dakhran c5897995a7 mtmd : chat : Fix extra \n between text and media marker (#19595 ) * mtmd : chat : Fix extra \n between text and media marker Thanks to @tugot17 for detecting and reporting the issue. For vision models (e.g. LFM2.5-VL-1.6B and Qwen/Qwen3-VL-4B-Instruct) `llama-mtmd-cli` produces identical output to HF implementation. However `llama-server` doesn't. I traced it down to extra newline inserted after `<__media__>`. This happens in `to_json_oaicompat`, that treats media markers as text and joins all parts with `\n` separator. PR introduces new type `media_marker` and uses it for media markers. Extra logic is added to prevent insertion of newlines before and after media markers. With this change number of input tokens is identical to HF implementation and as a result the output is also identical. I explored other ways to address the issue * remove completely `\n` between text parts in `to_json_oaicompat` * merge text messages in server-common.cpp before sending them to `to_json_oaicompat` Please propose alternative ways of fixing this issue. * Refactor to use explicite per type ifs * Update common/chat.cpp Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com> * Update common_chat_templates_apply_legacy --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>		2026-02-19 12:18:57 +01:00
..
jinja	Add Jinja support for "indent" string filter (#19529 )	2026-02-19 00:25:52 +01:00
CMakeLists.txt	build : cleanup library linking logic (#19665 )	2026-02-17 08:36:45 +01:00
arg.cpp	args : add -kvu to llama-parallel (#19577 )	2026-02-12 21:52:41 +02:00
arg.h	vendor : update cpp-httplib to 0.30.0 (#18660 )	2026-01-08 13:53:54 +01:00
base64.hpp	llava : expose as a shared library for downstream projects (#3613 )	2023-11-07 00:36:23 +03:00
build-info.cpp.in	cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 )	2025-06-13 10:38:52 +02:00
chat-parser-xml-toolcall.cpp	Fix Kimi-K2 tool-call parsing issues (#17376 )	2025-12-08 14:32:04 +01:00
chat-parser-xml-toolcall.h	Fix Kimi-K2 tool-call parsing issues (#17376 )	2025-12-08 14:32:04 +01:00
chat-parser.cpp	server : support preserving reasoning_content in assistant message (#18994 )	2026-01-22 21:30:06 +01:00
chat-parser.h	cli : fix reasoning responses in CLI (#18961 )	2026-01-20 18:23:25 +01:00
chat-peg-parser.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
chat-peg-parser.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
chat.cpp	mtmd : chat : Fix extra \n between text and media marker (#19595 )	2026-02-19 12:18:57 +01:00
chat.h	chat: fix case where template accepts type content only (#19419 )	2026-02-09 22:14:12 +01:00
common.cpp	common : make small string helpers as inline functions (#19693 )	2026-02-18 08:03:01 +01:00
common.h	common : make small string helpers as inline functions (#19693 )	2026-02-18 08:03:01 +01:00
console.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
console.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
debug.cpp	debug: make common_debug_print_tensor readable (#19331 )	2026-02-04 17:55:31 +01:00
debug.h	Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914 )	2026-01-14 20:29:35 +01:00
download.cpp	build : remove LLAMA_HTTPLIB option (#19623 )	2026-02-15 15:38:50 +01:00
download.h	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
http.h	common : clarify HTTPS build options in error message (#19103 )	2026-01-27 06:16:00 +01:00
json-partial.cpp	common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )	2025-11-18 18:54:15 +01:00
json-partial.h	cli : fix reasoning responses in CLI (#18961 )	2026-01-20 18:23:25 +01:00
json-schema-to-grammar.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
json-schema-to-grammar.h	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
llguidance.cpp	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00
log.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
log.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
ngram-cache.cpp	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 )	2026-01-28 19:42:42 +02:00
ngram-cache.h	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 )	2026-01-28 19:42:42 +02:00
ngram-map.cpp	llama : correct typos 'occured' and 'occurences' (#19414 )	2026-02-11 07:05:31 +01:00
ngram-map.h	llama : correct typos 'occured' and 'occurences' (#19414 )	2026-02-11 07:05:31 +01:00
ngram-mod.cpp	spec : add ngram-mod (#19164 )	2026-01-30 18:21:48 +02:00
ngram-mod.h	ngram-mod : fix build [no ci] (#19216 )	2026-01-30 21:27:27 +02:00
peg-parser.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
peg-parser.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
preset.cpp	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
preset.h	common: support remote preset (#18520 )	2026-01-08 22:35:40 +01:00
regex-partial.cpp	common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342 )	2026-01-03 16:02:43 -06:00
regex-partial.h	`common`: add partial regex support (#12808 )	2025-05-14 19:50:57 +01:00
sampling.cpp	llama : add adaptive-p sampler (#17927 )	2026-01-15 19:16:29 +02:00
sampling.h	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00
speculative.cpp	spec : remove check rate (#19377 )	2026-02-09 15:30:50 +02:00
speculative.h	common : add common_speculative_is_compat() (#19270 )	2026-02-06 16:47:22 +02:00
unicode.cpp	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
unicode.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00