llama.cpp/ggml
Matthias Fahsold 3b9b01a5b6 ggml-rpc: Add graceful error handling for graph compute operations
The current RPC implementation crashes the server with GGML_ASSERT when
ggml_backend_graph_compute returns a non-success status. This causes
distributed inference setups to fail completely when a single worker
encounters a temporary error (memory pressure, backend issues, etc.).

This patch:
1. Adds rpc_msg_graph_compute_rsp and rpc_msg_graph_recompute_rsp structs
2. Replaces GGML_ASSERT with graceful error logging on the server side
3. Propagates ggml_status back to the client via RPC response
4. Allows clients to handle errors appropriately (retry, failover, etc.)

Fixes: https://github.com/ggml-org/llama.cpp/issues/11929
Fixes: https://github.com/gpustack/gpustack/issues/1178
2026-01-19 17:11:51 +01:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten. (#18628) 2026-01-08 08:36:42 -08:00
src ggml-rpc: Add graceful error handling for graph compute operations 2026-01-19 17:11:51 +01:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml : bump version to 0.9.5 (ggml/1410) 2025-12-31 18:54:43 +02:00