llama.cpp

History

Georgi Gerganov cd5e3b5754 server : support unified cache across slots (#16736 ) * server : support unified context across slots * cont : fix speculative decoding initialization * context : fix n_ctx_per_seq computation * server : purge slots one by one * tests : add unified cache server tests * llama : update per-seq context computation * test-thread-safety : handle tiny training context of the input model * server : fix server_tokens clear() * server : use 4 slots + unified KV by default * llama : add note about context size queries * cont : update todos [no ci] * context : do not cap the size of the context * tests : adjust parameters to be CI friendlier * context : add warning		2025-11-02 18:14:04 +02:00
..
test_basic.py	…
test_chat_completion.py	server : support unified cache across slots (#16736 )	2025-11-02 18:14:04 +02:00
test_completion.py	server : support unified cache across slots (#16736 )	2025-11-02 18:14:04 +02:00
test_ctx_shift.py	…
test_embedding.py	…
test_infill.py	server : support unified cache across slots (#16736 )	2025-11-02 18:14:04 +02:00
test_lora.py	…
test_rerank.py	…
test_security.py	…
test_slot_save.py	…
test_speculative.py	…
test_template.py	…
test_tokenize.py	…
test_tool_call.py	…
test_vision_api.py	…