llama.cpp/vendor
Radoslav Gerganov 3ace5fa277 server : add special handling for /health in httplib
When the number of parallel requests to llama-server exceed the number
of http threads, llama-server stop responding to /health which is very
disruptive in k8s deployments, causing restarts of properly working
inference endpoints.

Unfortunately, there is no way to fix this outside of httplib and this
patch adds a rather ugly hack for handling GET /health requests before
dispatching them to the thread pool.

No changes are made in the HTTPS implementation.

closes: #20684
2026-03-20 15:44:06 +02:00
..
cpp-httplib server : add special handling for /health in httplib 2026-03-20 15:44:06 +02:00
miniaudio vendor : update miniaudio to 0.11.25 (#20209) 2026-03-11 11:01:56 +08:00
nlohmann sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
sheredom server: introduce API for serving / loading / unloading multiple models (#17470) 2025-12-01 19:41:04 +01:00
stb sync : vendor (#13901) 2025-05-30 16:25:45 +03:00