llama.cpp

History

Radoslav Gerganov 3ace5fa277 server : add special handling for /health in httplib When the number of parallel requests to llama-server exceed the number of http threads, llama-server stop responding to /health which is very disruptive in k8s deployments, causing restarts of properly working inference endpoints. Unfortunately, there is no way to fix this outside of httplib and this patch adds a rather ugly hack for handling GET /health requests before dispatching them to the thread pool. No changes are made in the HTTPS implementation. closes: #20684		2026-03-20 15:44:06 +02:00
..
cpp-httplib	server : add special handling for /health in httplib	2026-03-20 15:44:06 +02:00
miniaudio	vendor : update miniaudio to 0.11.25 (#20209 )	2026-03-11 11:01:56 +08:00
nlohmann	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
sheredom	server: introduce API for serving / loading / unloading multiple models (#17470 )	2025-12-01 19:41:04 +01:00
stb	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00