llama.cpp/ggml/src/ggml-rpc
Radoslav Gerganov 15d2b46b4d
rpc : cache and reuse compute graphs (#15405)
Store the last computed graph and reuse it when possible.
Also do not return response from GRAPH_COMPUTE and assume it always
completes successfully. If this this is not the case, the server closes
the connection. This saves us a network round trip to the server.
2025-11-28 08:33:51 +00:00
..
CMakeLists.txt ggml : add support for dynamic loading of backends (#10469) 2024-11-25 15:13:39 +01:00
ggml-rpc.cpp rpc : cache and reuse compute graphs (#15405) 2025-11-28 08:33:51 +00:00