llama.cpp/tools
ytian218 27228447d9 server: fix crash when batch > ubatch with embeddings (#12836)
Fixes #12836 where the server crashes with GGML_ASSERT failure when
running with embeddings enabled and n_batch > n_ubatch.

Root cause: Embeddings use non-causal attention which requires all
tokens to be processed within a single ubatch. When n_batch > n_ubatch,
the server attempts to split processing, causing assertion failure.

Solution:
- Add parameter validation in main() after common_params_parse()
- When embeddings enabled and n_batch > n_ubatch:
  * Log warnings explaining the issue
  * Automatically set n_batch = n_ubatch
  * Prevent server crash

This follows the approach suggested by @ggerganov in issue #12836.

Note: This supersedes stalled PR #12940 which attempted a runtime fix
in the old examples/server/server.cpp location. This implementation
validates at startup in tools/server/server.cpp (current location).

Testing:
- Build: Compiles successfully
- Validation triggers: Warns when -b > -ub with --embedding
- Auto-correction works: Adjusts n_batch = n_ubatch
- No false positives: Valid params don't trigger warnings
- Verified on macOS M3 Pro with embedding model
2025-12-10 18:32:53 -05:00
..
batched-bench batched-bench : add "separate text gen" mode (#17103) 2025-11-10 12:59:29 +02:00
cvector-generator cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
export-lora cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
gguf-split ci : use smaller model (#16168) 2025-09-22 09:11:39 +03:00
imatrix Manually link -lbsd to resolve flock symbol on AIX (#16610) 2025-10-23 19:37:31 +08:00
llama-bench bench : cache the llama_context state at computed depth (#16944) 2025-11-07 21:23:11 +02:00
main common : more accurate sampling timing (#17382) 2025-11-20 13:40:10 +02:00
mtmd mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277) 2025-11-15 12:41:16 +01:00
perplexity perplexity : show more kl-divergence data (#16321) 2025-09-29 09:30:45 +03:00
quantize ci : use smaller model (#16168) 2025-09-22 09:11:39 +03:00
rpc Install rpc-server when GGML_RPC is ON. (#17149) 2025-11-11 10:53:59 +00:00
run Manually link -lbsd to resolve flock symbol on AIX (#16610) 2025-10-23 19:37:31 +08:00
server server: fix crash when batch > ubatch with embeddings (#12836) 2025-12-10 18:32:53 -05:00
tokenize cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
tts model : Apertus model implementation (#15852) 2025-10-02 20:43:22 +03:00
CMakeLists.txt mtmd : rename llava directory to mtmd (#13311) 2025-05-05 16:02:55 +02:00