llama.cpp

History

uaruss 5d9f64c54e ggml-cuda: fix ROCm multi-GPU illegal memory access in recurrent state restore Remove early-return optimization in ggml_cuda_set_device() that caused hipErrorIllegalAddress on ROCm multi-GPU setups with hybrid recurrent models (Mamba/SSM architectures). On ROCm, hipGetDevice() can return an unexpected value on threads that have never explicitly called hipSetDevice(). If this value matches ctx->device, the early-return fires and hipSetDevice() is never called, causing the subsequent hipMemcpyAsync to fail with current device: -1. cudaSetDevice() with the already-active device is a near no-op in modern CUDA/ROCm drivers, so removing the optimization has negligible performance impact while eliminating this class of thread context bugs. Also add missing ggml_cuda_set_device() call in ggml_backend_cuda_set_tensor_async() for consistency with all other cudaMemcpyAsync call sites in this file. Fixes #21140 Tested on: 2x AMD Radeon AI Pro R9700 (gfx1201), ROCm 7.2.0		2026-03-29 23:31:27 -04:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	llama : enable chunked fused GDN path (#20340 )	2026-03-11 22:46:40 +02:00
src	ggml-cuda: fix ROCm multi-GPU illegal memory access in recurrent state restore	2026-03-29 23:31:27 -04:00
.gitignore	…
CMakeLists.txt	ggml : fix typo gmml (#20512 )	2026-03-13 14:36:13 +01:00