llama.cpp

History

Daniel Bevenius 2b6dfe824d llama : remove write/read of output ids/logits/embeddings (#18862 ) * llama : remove write/read of output ids/logits/embeddings This commit removes the write/read of output ids, logits and embeddings from the llama context state. Refs: https://github.com/ggml-org/llama.cpp/pull/18862#issuecomment-3756330941 * completion : add replying of session state This commit updates the session handing in the completion tool to handle the that logits are no longer stored in the session file. Instead, we need to replay the last token to get the logits for sampling. * common : add common_prompt_batch_decode function This commit adds a new function which is responsible for decoding prompt and optionally handle the saving for session data. * update save-state.cpp to use llama_state_load_file This commit updates the save-load-state example to utilize the new llama_state_load_file function for loading the model state from a file. And it also replays the last token after loading since this state is now stored before the last token is processed. * examples : set n_seq_max = 2 for ctx3 This commit updates the save-load-state example to set the n_seq_max parameter to 2 when initializing the ctx3 context. The motivation for this change is that using 1 as n_parallel/n_seq_max the context only supports one sequence, but the test laster tries to use a second sequence which results in the following error: ```console main : loaded state with 4 tokens main : seq 0 copied, 225760 bytes main : kv cache cleared find_slot: seq_id=1 >= n_seq_max=1 Try using a bigger --parallel value state_read_meta: failed to find available cells in kv cache ``` This seems to only happen for recurrent/hybrid models.		2026-02-23 07:04:30 +01:00
..
jinja	jinja: correct stats for tojson and string filters (#19785 )	2026-02-22 21:08:23 +01:00
CMakeLists.txt	build : cleanup library linking logic (#19665 )	2026-02-17 08:36:45 +01:00
arg.cpp	args : add -kvu to llama-parallel (#19577 )	2026-02-12 21:52:41 +02:00
arg.h	vendor : update cpp-httplib to 0.30.0 (#18660 )	2026-01-08 13:53:54 +01:00
base64.hpp	llava : expose as a shared library for downstream projects (#3613 )	2023-11-07 00:36:23 +03:00
build-info.cpp.in	cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 )	2025-06-13 10:38:52 +02:00
chat-parser-xml-toolcall.cpp	common : fix improper trimming in XML parser on complete message (#19805 )	2026-02-22 17:34:54 +01:00
chat-parser-xml-toolcall.h	Fix Kimi-K2 tool-call parsing issues (#17376 )	2025-12-08 14:32:04 +01:00
chat-parser.cpp	common : merge qwen3-coder and nemotron nano 3 parsers (#19765 )	2026-02-20 23:22:22 +01:00
chat-parser.h	cli : fix reasoning responses in CLI (#18961 )	2026-01-20 18:23:25 +01:00
chat-peg-parser.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
chat-peg-parser.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
chat.cpp	common : merge qwen3-coder and nemotron nano 3 parsers (#19765 )	2026-02-20 23:22:22 +01:00
chat.h	common : merge qwen3-coder and nemotron nano 3 parsers (#19765 )	2026-02-20 23:22:22 +01:00
common.cpp	llama : remove write/read of output ids/logits/embeddings (#18862 )	2026-02-23 07:04:30 +01:00
common.h	llama : remove write/read of output ids/logits/embeddings (#18862 )	2026-02-23 07:04:30 +01:00
console.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
console.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
debug.cpp	debug: make common_debug_print_tensor readable (#19331 )	2026-02-04 17:55:31 +01:00
debug.h	Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914 )	2026-01-14 20:29:35 +01:00
download.cpp	build : remove LLAMA_HTTPLIB option (#19623 )	2026-02-15 15:38:50 +01:00
download.h	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
http.h	common : clarify HTTPS build options in error message (#19103 )	2026-01-27 06:16:00 +01:00
json-partial.cpp	common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )	2025-11-18 18:54:15 +01:00
json-partial.h	cli : fix reasoning responses in CLI (#18961 )	2026-01-20 18:23:25 +01:00
json-schema-to-grammar.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
json-schema-to-grammar.h	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
llguidance.cpp	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00
log.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
log.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
ngram-cache.cpp	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 )	2026-01-28 19:42:42 +02:00
ngram-cache.h	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 )	2026-01-28 19:42:42 +02:00
ngram-map.cpp	llama : correct typos 'occured' and 'occurences' (#19414 )	2026-02-11 07:05:31 +01:00
ngram-map.h	llama : correct typos 'occured' and 'occurences' (#19414 )	2026-02-11 07:05:31 +01:00
ngram-mod.cpp	spec : add ngram-mod (#19164 )	2026-01-30 18:21:48 +02:00
ngram-mod.h	ngram-mod : fix build [no ci] (#19216 )	2026-01-30 21:27:27 +02:00
peg-parser.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
peg-parser.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
preset.cpp	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
preset.h	common: support remote preset (#18520 )	2026-01-08 22:35:40 +01:00
regex-partial.cpp	common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342 )	2026-01-03 16:02:43 -06:00
regex-partial.h	`common`: add partial regex support (#12808 )	2025-05-14 19:50:57 +01:00
sampling.cpp	llama : add adaptive-p sampler (#17927 )	2026-01-15 19:16:29 +02:00
sampling.h	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00
speculative.cpp	spec : remove check rate (#19377 )	2026-02-09 15:30:50 +02:00
speculative.h	common : add common_speculative_is_compat() (#19270 )	2026-02-06 16:47:22 +02:00
unicode.cpp	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
unicode.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00