The current RPC implementation crashes the server with GGML_ASSERT when ggml_backend_graph_compute returns a non-success status. This causes distributed inference setups to fail completely when a single worker encounters a temporary error (memory pressure, backend issues, etc.). This patch: 1. Adds rpc_msg_graph_compute_rsp and rpc_msg_graph_recompute_rsp structs 2. Replaces GGML_ASSERT with graceful error logging on the server side 3. Propagates ggml_status back to the client via RPC response 4. Allows clients to handle errors appropriately (retry, failover, etc.) Fixes: https://github.com/ggml-org/llama.cpp/issues/11929 Fixes: https://github.com/gpustack/gpustack/issues/1178 |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||