llama.cpp/ggml/src/ggml-cann
Chenguang Li 10d8b2b6b0
CANN: Add ROPE sin/cos cache for reuse (#15912)
* CANN: Add ROPE sin/cos cache for reuse

Introduce sin/cos caching mechanism in ROPE to avoid redundant
computation across layers. The cache is built on the first layer
per device and reused by subsequent layers if parameters match.

- Added sin_cache / cos_cache pointers and position_length tracking
- Introduced cache validity flags and properties:
  (ext_factor, theta_scale, freq_scale, attn_factor, is_neox)
- Accelerates ROPE by eliminating repeated sin/cos generation

This change reduces overhead in multi-layer scenarios while
preserving correctness by verifying parameter consistency.

Co-authored-by: hipudding <huafengchun@gmail.com>

* fix typo

Signed-off-by: noemotiovon <757486878@qq.com>

---------

Signed-off-by: noemotiovon <757486878@qq.com>
Co-authored-by: hipudding <huafengchun@gmail.com>
2025-09-10 18:42:00 +08:00
..
CMakeLists.txt CANN: add support for ACL Graph (#15065) 2025-08-06 14:12:42 +08:00
Doxyfile CANN: Add the basic supports of Flash Attention kernel (#13627) 2025-05-26 10:20:18 +08:00
acl_tensor.cpp CANN: Implement GLU ops (#14884) 2025-07-26 17:56:18 +08:00
acl_tensor.h CANN: Add the basic supports of Flash Attention kernel (#13627) 2025-05-26 10:20:18 +08:00
aclnn_ops.cpp CANN: Add ROPE sin/cos cache for reuse (#15912) 2025-09-10 18:42:00 +08:00
aclnn_ops.h CANN: Add ggml_set_rows (#14943) 2025-07-29 22:36:43 +08:00
common.h CANN: Add ROPE sin/cos cache for reuse (#15912) 2025-09-10 18:42:00 +08:00
ggml-cann.cpp CANN: Add ROPE sin/cos cache for reuse (#15912) 2025-09-10 18:42:00 +08:00