In router mode with --models-max 1, switching models kills the child
process, destroying all in-memory state including the prompt cache and
context checkpoints. This forces a full prompt re-processing on every
model swap return, which can take tens of seconds for long prompts.
This patch adds two methods (auto_save_slots, auto_restore_slots) that
are called automatically during the child process lifecycle:
- auto_save_slots: called after start_loop() returns (before clean_up),
saves each slot's state + checkpoints to --slot-save-path using the
model filename stem as the save name.
- auto_restore_slots: called after load_model() (before start_loop),
checks if a save file exists for this model and restores it.
Combined with the checkpoint persistence from the previous commit,
this makes model hot-swapping fully transparent: the conversation
context is preserved across swaps with no client-side changes.
Tested with Qwen3.5-27B + Qwen3.5-35B-A3B MoE in router mode:
- Swap 27B→MoE: ~7s (incl auto-save 826 MiB state + 749 MiB checkpoints)
- Swap MoE→27B: ~6s (incl auto-restore)
- cache_n after restore: 26549 (91ms vs 23s without)