llama.cpp

History

Radoslav Gerganov 3ace5fa277 server : add special handling for /health in httplib When the number of parallel requests to llama-server exceed the number of http threads, llama-server stop responding to /health which is very disruptive in k8s deployments, causing restarts of properly working inference endpoints. Unfortunately, there is no way to fix this outside of httplib and this patch adds a rather ugly hack for handling GET /health requests before dispatching them to the thread pool. No changes are made in the HTTPS implementation. closes: #20684		2026-03-20 15:44:06 +02:00
..
CMakeLists.txt	vendor : update cpp-httplib to 0.35.0 (#19969 )	2026-02-28 13:53:56 +01:00
LICENSE	common : add --license to display embedded licenses (#18696 )	2026-01-10 09:46:24 +01:00
httplib.cpp	server : add special handling for /health in httplib	2026-03-20 15:44:06 +02:00
httplib.h	server : add special handling for /health in httplib	2026-03-20 15:44:06 +02:00