llama.cpp

History

Matthias Fahsold 3b9b01a5b6 ggml-rpc: Add graceful error handling for graph compute operations The current RPC implementation crashes the server with GGML_ASSERT when ggml_backend_graph_compute returns a non-success status. This causes distributed inference setups to fail completely when a single worker encounters a temporary error (memory pressure, backend issues, etc.). This patch: 1. Adds rpc_msg_graph_compute_rsp and rpc_msg_graph_recompute_rsp structs 2. Replaces GGML_ASSERT with graceful error logging on the server side 3. Propagates ggml_status back to the client via RPC response 4. Allows clients to handle errors appropriately (retry, failover, etc.) Fixes: https://github.com/ggml-org/llama.cpp/issues/11929 Fixes: https://github.com/gpustack/gpustack/issues/1178		2026-01-19 17:11:51 +01:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	ggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten. (#18628 )	2026-01-08 08:36:42 -08:00
src	ggml-rpc: Add graceful error handling for graph compute operations	2026-01-19 17:11:51 +01:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : bump version to 0.9.5 (ggml/1410)	2025-12-31 18:54:43 +02:00