llama.cpp/common
Yes You Can Have Your Own 50e0ad08fb
server: save and clear idle slots on new task (`--clear-idle`) (#20993)
* server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE)

* server: move idle slot KV clearing to slot release

The save "cost" is now paid by the finishing request.

* server: add --kv-clear-idle flag, enable by default

* server: skip clearing last idle slot, clear on launch

* server: test --no-kv-clear-idle flag

* server: simplify on-release clearing loop

* server: remove on-release KV clearing, keep launch-only

* cont : clean-up

* tests: update log strings after --clear-idle rename

* tests: use debug tags instead of log message matching

* test: fix Windows CI by dropping temp log file unlink

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-04-03 19:02:27 +02:00
..
jinja jinja: coerce input for string-specific filters (#21370) 2026-04-03 15:03:33 +02:00
CMakeLists.txt common : add standard Hugging Face cache support (#20775) 2026-03-24 07:30:33 +01:00
arg.cpp server: save and clear idle slots on new task (`--clear-idle`) (#20993) 2026-04-03 19:02:27 +02:00
arg.h vendor : update cpp-httplib to 0.30.0 (#18660) 2026-01-08 13:53:54 +01:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) 2025-06-13 10:38:52 +02:00
chat-auto-parser-generator.cpp common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) 2026-04-03 17:51:52 +02:00
chat-auto-parser-helpers.cpp common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss (#20912) 2026-03-23 22:21:47 -05:00
chat-auto-parser-helpers.h chat : avoid including json in chat.h (#21306) 2026-04-03 09:07:59 +03:00
chat-auto-parser.h common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) 2026-04-03 17:51:52 +02:00
chat-diff-analyzer.cpp common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) 2026-04-03 17:51:52 +02:00
chat-peg-parser.cpp fix: gemma 4 template (#21326) 2026-04-02 23:31:02 +02:00
chat-peg-parser.h fix: gemma 4 template (#21326) 2026-04-02 23:31:02 +02:00
chat.cpp common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) 2026-04-03 17:51:52 +02:00
chat.h common/parser: fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers (#21230) 2026-04-03 17:51:52 +02:00
common.cpp tests: allow exporting graph ops from HF file without downloading weights (#21182) 2026-04-02 18:19:20 +02:00
common.h server: save and clear idle slots on new task (`--clear-idle`) (#20993) 2026-04-03 19:02:27 +02:00
console.cpp cli : add command and file auto-completion (#19985) 2026-03-05 10:47:28 +01:00
console.h cli : add command and file auto-completion (#19985) 2026-03-05 10:47:28 +01:00
debug.cpp debug: make common_debug_print_tensor readable (#19331) 2026-02-04 17:55:31 +01:00
debug.h chore : correct typos [no ci] (#20041) 2026-03-05 08:50:21 +01:00
download.cpp common : cleanup logs and modernize the progress bar (#21215) 2026-03-31 16:18:00 +02:00
download.h common : add standard Hugging Face cache support (#20775) 2026-03-24 07:30:33 +01:00
hf-cache.cpp common : add getpwuid fallback for HF cache when HOME is not set (#21035) 2026-03-26 20:34:23 +01:00
hf-cache.h common : fix split model migration (#21019) 2026-03-26 12:04:37 +01:00
http.h server: Parse port numbers from MCP server URLs in CORS proxy (#20208) 2026-03-09 17:47:54 +01:00
json-partial.cpp common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
json-partial.h cli : fix reasoning responses in CLI (#18961) 2026-01-20 18:23:25 +01:00
json-schema-to-grammar.cpp common/json-schema: fix: handle non-capturing groups (?:...) in JSON schema pattern converter (#21124) 2026-03-28 17:55:38 +01:00
json-schema-to-grammar.h common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
llguidance.cpp sampling : add support for backend sampling (#17004) 2026-01-04 22:22:16 +02:00
log.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
log.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
ngram-cache.cpp spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
ngram-cache.h spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
ngram-map.cpp llama : correct typos 'occured' and 'occurences' (#19414) 2026-02-11 07:05:31 +01:00
ngram-map.h fix: correct misspellings in code comments (#21217) 2026-03-31 13:50:51 +02:00
ngram-mod.cpp spec : add ngram-mod (#19164) 2026-01-30 18:21:48 +02:00
ngram-mod.h ngram-mod : fix build [no ci] (#19216) 2026-01-30 21:27:27 +02:00
peg-parser.cpp common : fix tool call type detection for nullable and enum schemas (#21327) 2026-04-03 17:51:23 +02:00
peg-parser.h common: consolidate PEG string parsers (#20263) 2026-03-10 00:29:21 +01:00
preset.cpp preset: allow named remote preset (#18728) 2026-01-10 15:12:29 +01:00
preset.h common: support remote preset (#18520) 2026-01-08 22:35:40 +01:00
reasoning-budget.cpp common : inhibit lazy grammar sampler while reasoning is active (#20970) 2026-03-27 18:30:40 +01:00
reasoning-budget.h common : inhibit lazy grammar sampler while reasoning is active (#20970) 2026-03-27 18:30:40 +01:00
regex-partial.cpp common : fix iterator::end() dereference (#20445) 2026-03-16 08:50:38 +02:00
regex-partial.h `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
sampling.cpp common : Disable backend sampling if reasoning budget is enabled (#21209) 2026-03-31 10:14:01 +03:00
sampling.h sampling : add support for backend sampling (#17004) 2026-01-04 22:22:16 +02:00
speculative.cpp spec : remove check rate (#19377) 2026-02-09 15:30:50 +02:00
speculative.h common : add common_speculative_is_compat() (#19270) 2026-02-06 16:47:22 +02:00
unicode.cpp common/parser: handle reasoning budget (#20297) 2026-03-11 10:26:12 +01:00
unicode.h common/parser: handle reasoning budget (#20297) 2026-03-11 10:26:12 +01:00