* llama-server: add router multi-model tests (#17704)
Add 4 test cases for model router:
- test_router_unload_model: explicit model unloading
- test_router_models_max_evicts_lru: LRU eviction with --models-max
- test_router_no_models_autoload: --no-models-autoload flag behavior
- test_router_api_key_required: API key authentication
Tests use async model loading with polling and graceful skip when
insufficient models available for eviction testing.
utils.py changes:
- Add models_max, models_dir, no_models_autoload attributes to ServerProcess
- Handle JSONDecodeError for non-JSON error responses (fallback to text)
* llama-server: update test models to new HF repos
* add offline
* llama-server: fix router LRU eviction test and add preloading
Fix eviction test: load 2 models first, verify state, then load
3rd to trigger eviction. Previous logic loaded all 3 at once,
causing first model to be evicted before verification could occur.
Add module fixture to preload models via ServerPreset.load_all()
and mark test presets as offline to use cached models
* llama-server: fix split model download on Windows
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>