llama.cpp/ggml
hipudding 3dc7397a27
CANN: fix RoPE cache issue on multi-device (#15629)
* CANN: fix RoPE cache issue on multi-device

RoPE cache only needs to be computed once per token.
However, in multi-device scenarios, not every device starts
computation from layer 0, which may lead to unallocated memory
issues and precision errors.

This commit records the first layer of each device to avoid
the above issues.

* CANN: Optimize first-layer detection method

* CANN: Remove trailing whitespace

* CANN: Only cache the data that can be determined as unchanged through the parameters.

* CANN: Update function comment
2025-09-01 08:57:00 +08:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include llama : separate compute buffer reserve from fattn check (#15696) 2025-08-31 15:49:03 +02:00
src CANN: fix RoPE cache issue on multi-device (#15629) 2025-09-01 08:57:00 +08:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml: update kleidiai to v1.13.0 (#15663) 2025-08-31 00:03:42 +08:00