llama.cpp/tools
Xuan-Son Nguyen ddcb75dd8a
server: add auto-sleep after N seconds of idle (#18228)
* implement sleeping at queue level

* implement server-context suspend

* add test

* add docs

* optimization: add fast path

* make sure to free llama_init

* nits

* fix use-after-free

* allow /models to be accessed during sleeping, fix use-after-free

* don't allow accessing /models during sleep, it is not thread-safe

* fix data race on accessing props and model_meta

* small clean up

* trailing whitespace

* rm outdated comments
2025-12-21 02:24:42 +01:00
..
batched-bench
cli server: add auto-sleep after N seconds of idle (#18228) 2025-12-21 02:24:42 +01:00
completion arg: clarify auto kvu/np being set on server (#17997) 2025-12-16 12:01:27 +01:00
cvector-generator common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
export-lora
fit-params llama-fit-params: QoL impr. for prints/errors (#18089) 2025-12-17 00:03:19 +01:00
gguf-split cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
imatrix common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
llama-bench cli: fixed dead links to tools/main for cli and completion, fixed code owners (#17993) 2025-12-15 11:47:04 +01:00
mtmd model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106) 2025-12-19 00:18:01 +01:00
perplexity common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
quantize cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
rpc Install rpc-server when GGML_RPC is ON. (#17149) 2025-11-11 10:53:59 +00:00
run Manually link -lbsd to resolve flock symbol on AIX (#16610) 2025-10-23 19:37:31 +08:00
server server: add auto-sleep after N seconds of idle (#18228) 2025-12-21 02:24:42 +01:00
tokenize
tts common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
CMakeLists.txt llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653) 2025-12-15 09:24:59 +01:00