llama.cpp/tools/server/tests/unit
Georgi Gerganov cd5e3b5754
server : support unified cache across slots (#16736)
* server : support unified context across slots

* cont : fix speculative decoding initialization

* context : fix n_ctx_per_seq computation

* server : purge slots one by one

* tests : add unified cache server tests

* llama : update per-seq context computation

* test-thread-safety : handle tiny training context of the input model

* server : fix server_tokens clear()

* server : use 4 slots + unified KV by default

* llama : add note about context size queries

* cont : update todos [no ci]

* context : do not cap the size of the context

* tests : adjust parameters to be CI friendlier

* context : add warning
2025-11-02 18:14:04 +02:00
..
test_basic.py
test_chat_completion.py server : support unified cache across slots (#16736) 2025-11-02 18:14:04 +02:00
test_completion.py server : support unified cache across slots (#16736) 2025-11-02 18:14:04 +02:00
test_ctx_shift.py
test_embedding.py
test_infill.py server : support unified cache across slots (#16736) 2025-11-02 18:14:04 +02:00
test_lora.py
test_rerank.py
test_security.py
test_slot_save.py
test_speculative.py
test_template.py
test_tokenize.py
test_tool_call.py
test_vision_api.py