llama.cpp/tools/server/tests/unit
Pascal e7c2cf1356
server: add router multi-model tests (#17704) (#17722)
* llama-server: add router multi-model tests (#17704)

Add 4 test cases for model router:
- test_router_unload_model: explicit model unloading
- test_router_models_max_evicts_lru: LRU eviction with --models-max
- test_router_no_models_autoload: --no-models-autoload flag behavior
- test_router_api_key_required: API key authentication

Tests use async model loading with polling and graceful skip when
insufficient models available for eviction testing.

utils.py changes:
- Add models_max, models_dir, no_models_autoload attributes to ServerProcess
- Handle JSONDecodeError for non-JSON error responses (fallback to text)

* llama-server: update test models to new HF repos

* add offline

* llama-server: fix router LRU eviction test and add preloading

Fix eviction test: load 2 models first, verify state, then load
3rd to trigger eviction. Previous logic loaded all 3 at once,
causing first model to be evicted before verification could occur.

Add module fixture to preload models via ServerPreset.load_all()
and mark test presets as offline to use cached models

* llama-server: fix split model download on Windows

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-12-03 15:10:37 +01:00
..
test_basic.py server: add router multi-model tests (#17704) (#17722) 2025-12-03 15:10:37 +01:00
test_chat_completion.py Server: Change Invalid Schema from Server Error (500) to User Error (400) (#17572) 2025-12-02 17:33:50 +01:00
test_compat_anthropic.py server : add Anthropic Messages API support (#17570) 2025-11-28 12:57:04 +01:00
test_completion.py server : handle failures to restore host cache (#17078) 2025-11-09 14:27:05 +02:00
test_ctx_shift.py memory : remove KV cache size padding (#16812) 2025-10-28 20:19:44 +02:00
test_embedding.py server : disable context shift by default (#15416) 2025-08-19 16:46:37 +03:00
test_infill.py server : support unified cache across slots (#16736) 2025-11-02 18:14:04 +02:00
test_lora.py server : disable context shift by default (#15416) 2025-08-19 16:46:37 +03:00
test_rerank.py server / ranking : add sorting and management of top_n (#16403) 2025-10-11 16:39:04 +03:00
test_router.py server: add router multi-model tests (#17704) (#17722) 2025-12-03 15:10:37 +01:00
test_security.py server: add --media-path for local media files (#17697) 2025-12-02 22:49:20 +01:00
test_slot_save.py server : disable context shift by default (#15416) 2025-08-19 16:46:37 +03:00
test_speculative.py kv-cache : pad the cache size to 256 for performance (#17046) 2025-11-07 20:03:25 +02:00
test_template.py server : speed up tests (#15836) 2025-09-06 14:45:24 +02:00
test_tokenize.py server : disable context shift by default (#15416) 2025-08-19 16:46:37 +03:00
test_tool_call.py server : speed up tests (#15836) 2025-09-06 14:45:24 +02:00
test_vision_api.py server : speed up tests (#15836) 2025-09-06 14:45:24 +02:00