llama.cpp

History

Georgi Gerganov cd5e3b5754 server : support unified cache across slots (#16736 ) * server : support unified context across slots * cont : fix speculative decoding initialization * context : fix n_ctx_per_seq computation * server : purge slots one by one * tests : add unified cache server tests * llama : update per-seq context computation * test-thread-safety : handle tiny training context of the input model * server : fix server_tokens clear() * server : use 4 slots + unified KV by default * llama : add note about context size queries * cont : update todos [no ci] * context : do not cap the size of the context * tests : adjust parameters to be CI friendlier * context : add warning	2025-11-02 18:14:04 +02:00
..
llama-cpp.h	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
llama.h	server : support unified cache across slots (#16736 )	2025-11-02 18:14:04 +02:00

server : support unified cache across slots (#16736 )

* server : support unified context across slots

* cont : fix speculative decoding initialization

* context : fix n_ctx_per_seq computation

* server : purge slots one by one

* tests : add unified cache server tests

* llama : update per-seq context computation

* test-thread-safety : handle tiny training context of the input model

* server : fix server_tokens clear()

* server : use 4 slots + unified KV by default

* llama : add note about context size queries

* cont : update todos [no ci]

* context : do not cap the size of the context

* tests : adjust parameters to be CI friendlier

* context : add warning

2025-11-02 18:14:04 +02:00

llama-cpp.h

llama : add `llama_vocab`, functions -> methods, naming (#11110 )

2025-01-12 11:32:42 +02:00

llama.h

server : support unified cache across slots (#16736 )

2025-11-02 18:14:04 +02:00