llama.cpp/ggml
Chenguang Li 10d8b2b6b0
CANN: Add ROPE sin/cos cache for reuse (#15912)
* CANN: Add ROPE sin/cos cache for reuse

Introduce sin/cos caching mechanism in ROPE to avoid redundant
computation across layers. The cache is built on the first layer
per device and reused by subsequent layers if parameters match.

- Added sin_cache / cos_cache pointers and position_length tracking
- Introduced cache validity flags and properties:
  (ext_factor, theta_scale, freq_scale, attn_factor, is_neox)
- Accelerates ROPE by eliminating repeated sin/cos generation

This change reduces overhead in multi-layer scenarios while
preserving correctness by verifying parameter consistency.

Co-authored-by: hipudding <huafengchun@gmail.com>

* fix typo

Signed-off-by: noemotiovon <757486878@qq.com>

---------

Signed-off-by: noemotiovon <757486878@qq.com>
Co-authored-by: hipudding <huafengchun@gmail.com>
2025-09-10 18:42:00 +08:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include cuda : fix supports_op condition for get_rows when number of blocks is too large (#15868) 2025-09-08 13:56:51 +03:00
src CANN: Add ROPE sin/cos cache for reuse (#15912) 2025-09-10 18:42:00 +08:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml-cpu: drop support for nnpa intrinsics (#15821) 2025-09-06 11:27:28 +08:00