llama.cpp

History

Tarek Dakhran 2e17f6a931 mtmd : chat : Fix extra \n between text and media marker Thanks to @tugot17 for detecting and reporting the issue. For vision models (e.g. LFM2.5-VL-1.6B and Qwen/Qwen3-VL-4B-Instruct) `llama-mtmd-cli` produces identical output to HF implementation. However `llama-server` doesn't. I traced it down to extra newline inserted after `<__media__>`. This happens in `to_json_oaicompat`, that treats media markers as text and joins all parts with `\n` separator. PR introduces new type `media_marker` and uses it for media markers. Extra logic is added to prevent insertion of newlines before and after media markers. With this change number of input tokens is identical to HF implementation and as a result the output is also identical. I explored other ways to address the issue * remove completely `\n` between text parts in `to_json_oaicompat` * merge text messages in server-common.cpp before sending them to `to_json_oaicompat` Please propose alternative ways of fixing this issue.		2026-02-13 16:10:48 +01:00
..
jinja	chat: fix case where template accepts type content only (#19419 )	2026-02-09 22:14:12 +01:00
CMakeLists.txt	spec : add ngram-mod (#19164 )	2026-01-30 18:21:48 +02:00
arg.cpp	args : add -kvu to llama-parallel (#19577 )	2026-02-12 21:52:41 +02:00
arg.h	vendor : update cpp-httplib to 0.30.0 (#18660 )	2026-01-08 13:53:54 +01:00
base64.hpp	llava : expose as a shared library for downstream projects (#3613 )	2023-11-07 00:36:23 +03:00
build-info.cpp.in	cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 )	2025-06-13 10:38:52 +02:00
chat-parser-xml-toolcall.cpp	Fix Kimi-K2 tool-call parsing issues (#17376 )	2025-12-08 14:32:04 +01:00
chat-parser-xml-toolcall.h	Fix Kimi-K2 tool-call parsing issues (#17376 )	2025-12-08 14:32:04 +01:00
chat-parser.cpp	server : support preserving reasoning_content in assistant message (#18994 )	2026-01-22 21:30:06 +01:00
chat-parser.h	cli : fix reasoning responses in CLI (#18961 )	2026-01-20 18:23:25 +01:00
chat-peg-parser.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
chat-peg-parser.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
chat.cpp	mtmd : chat : Fix extra \n between text and media marker	2026-02-13 16:10:48 +01:00
chat.h	chat: fix case where template accepts type content only (#19419 )	2026-02-09 22:14:12 +01:00
common.cpp	common : replace deprecated codecvt using parse_utf8_codepoint (#19517 )	2026-02-12 07:27:52 +01:00
common.h	common : remove unused token util functions (#19506 )	2026-02-11 17:41:35 +01:00
console.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
console.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
debug.cpp	debug: make common_debug_print_tensor readable (#19331 )	2026-02-04 17:55:31 +01:00
debug.h	Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914 )	2026-01-14 20:29:35 +01:00
download.cpp	common : improve download error reporting (#19491 )	2026-02-11 09:27:55 +01:00
download.h	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
http.h	common : clarify HTTPS build options in error message (#19103 )	2026-01-27 06:16:00 +01:00
json-partial.cpp	common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )	2025-11-18 18:54:15 +01:00
json-partial.h	cli : fix reasoning responses in CLI (#18961 )	2026-01-20 18:23:25 +01:00
json-schema-to-grammar.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
json-schema-to-grammar.h	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
llguidance.cpp	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00
log.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
log.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
ngram-cache.cpp	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 )	2026-01-28 19:42:42 +02:00
ngram-cache.h	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 )	2026-01-28 19:42:42 +02:00
ngram-map.cpp	llama : correct typos 'occured' and 'occurences' (#19414 )	2026-02-11 07:05:31 +01:00
ngram-map.h	llama : correct typos 'occured' and 'occurences' (#19414 )	2026-02-11 07:05:31 +01:00
ngram-mod.cpp	spec : add ngram-mod (#19164 )	2026-01-30 18:21:48 +02:00
ngram-mod.h	ngram-mod : fix build [no ci] (#19216 )	2026-01-30 21:27:27 +02:00
peg-parser.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
peg-parser.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
preset.cpp	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
preset.h	common: support remote preset (#18520 )	2026-01-08 22:35:40 +01:00
regex-partial.cpp	common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342 )	2026-01-03 16:02:43 -06:00
regex-partial.h	`common`: add partial regex support (#12808 )	2025-05-14 19:50:57 +01:00
sampling.cpp	llama : add adaptive-p sampler (#17927 )	2026-01-15 19:16:29 +02:00
sampling.h	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00
speculative.cpp	spec : remove check rate (#19377 )	2026-02-09 15:30:50 +02:00
speculative.h	common : add common_speculative_is_compat() (#19270 )	2026-02-06 16:47:22 +02:00
unicode.cpp	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
unicode.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00