llama.cpp/tools
Radoslav Gerganov 68ee98ae18
server : return HTTP 400 if prompt exceeds context length (#16486)
In streaming mode when prompt exceeds context length, the server returns
HTTP 200 status code with a JSON error in the body.  This is very
confusing and inconsistent with all other inference engines which return
HTTP 4xx error in this case.

This patch fixes this problem and makes the server return HTTP 400 in
such cases.
2025-10-10 16:11:07 +02:00
..
batched-bench cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
cvector-generator cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
export-lora cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
gguf-split ci : use smaller model (#16168) 2025-09-22 09:11:39 +03:00
imatrix cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
llama-bench llama : add --no-host to disable host buffers (#16310) 2025-10-06 19:55:53 +02:00
main llama-cli: prevent spurious assistant token (#16202) 2025-09-29 10:03:12 +03:00
mtmd chat : Granite Docling stopping (#16438) 2025-10-06 18:59:40 +02:00
perplexity perplexity : show more kl-divergence data (#16321) 2025-09-29 09:30:45 +03:00
quantize ci : use smaller model (#16168) 2025-09-22 09:11:39 +03:00
rpc rpc : update documentation (#16441) 2025-10-07 06:59:13 +00:00
run common: introduce http.h for httplib-based client (#16373) 2025-10-01 20:22:18 +03:00
server server : return HTTP 400 if prompt exceeds context length (#16486) 2025-10-10 16:11:07 +02:00
tokenize cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
tts model : Apertus model implementation (#15852) 2025-10-02 20:43:22 +03:00
CMakeLists.txt mtmd : rename llava directory to mtmd (#13311) 2025-05-05 16:02:55 +02:00