Fixes#6263 where server accepts mismatched batch/ubatch values with
embeddings, leading to suboptimal or incorrect behavior.
Problem: Embeddings and reranking use non-causal attention which requires
all tokens to be processed within a single ubatch. When n_batch != n_ubatch,
the configuration is incoherent. Default values differ (n_batch=2048,
n_ubatch=512), so users encounter this frequently.
Solution:
- Add parameter validation in main() after common_params_parse()
- When embeddings enabled and n_batch != n_ubatch:
* Log warnings explaining the requirement
* Automatically set both to min(n_batch, n_ubatch)
* Ensure coherent configuration
This follows the auto-correction approach suggested by @mirekphd
and provides better UX than strict rejection.
Testing:
✅ Builds successfully
✅ Validation triggers: -b 2048 -ub 512 --embedding → logs warnings, adjusts both to 512
✅ No false positives: -b 512 -ub 512 --embedding → no warnings
✅ Verified on macOS M3 Pro with embedding model