llama.cpp

History

ytian218 3827a23255 server: validate n_batch == n_ubatch for embeddings (#6263 ) Fixes #6263 where server accepts mismatched batch/ubatch values with embeddings, leading to suboptimal or incorrect behavior. Problem: Embeddings and reranking use non-causal attention which requires all tokens to be processed within a single ubatch. When n_batch != n_ubatch, the configuration is incoherent. Default values differ (n_batch=2048, n_ubatch=512), so users encounter this frequently. Solution: - Add parameter validation in main() after common_params_parse() - When embeddings enabled and n_batch != n_ubatch: * Log warnings explaining the requirement * Automatically set both to min(n_batch, n_ubatch) * Ensure coherent configuration This follows the auto-correction approach suggested by @mirekphd and provides better UX than strict rejection. Testing: ✅ Builds successfully ✅ Validation triggers: -b 2048 -ub 512 --embedding → logs warnings, adjusts both to 512 ✅ No false positives: -b 512 -ub 512 --embedding → no warnings ✅ Verified on macOS M3 Pro with embedding model		2025-12-16 23:35:48 -05:00
..
batched-bench	batched-bench : add "separate text gen" mode (#17103 )	2025-11-10 12:59:29 +02:00
cli	cli: fixed dead links to tools/main for cli and completion, fixed code owners (#17993 )	2025-12-15 11:47:04 +01:00
completion	arg: clarify auto kvu/np being set on server (#17997 )	2025-12-16 12:01:27 +01:00
cvector-generator	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
export-lora	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
fit-params	llama-fit-params: QoL impr. for prints/errors (#18089 )	2025-12-17 00:03:19 +01:00
gguf-split	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
imatrix	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
llama-bench	cli: fixed dead links to tools/main for cli and completion, fixed code owners (#17993 )	2025-12-15 11:47:04 +01:00
mtmd	arg: clarify auto kvu/np being set on server (#17997 )	2025-12-16 12:01:27 +01:00
perplexity	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
quantize	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
rpc	Install rpc-server when GGML_RPC is ON. (#17149 )	2025-11-11 10:53:59 +00:00
run	Manually link -lbsd to resolve flock symbol on AIX (#16610 )	2025-10-23 19:37:31 +08:00
server	server: validate n_batch == n_ubatch for embeddings (#6263 )	2025-12-16 23:35:48 -05:00
tokenize	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
tts	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
CMakeLists.txt	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 )	2025-12-15 09:24:59 +01:00