ACL graph capture disallows host-to-device memcpy and device memory malloc/free on the captured stream. Pre-load the RoPE cache before capture so that: - Host-to-device copies and allocations run on the non-captured stream - Cache metadata is populated and memory pool is warmed up - During capture, only on-device computations are recorded; host-side and allocation branches are skipped |
||
|---|---|---|
| .. | ||
| CMakeLists.txt | ||
| acl_tensor.cpp | ||
| acl_tensor.h | ||
| aclnn_ops.cpp | ||
| aclnn_ops.h | ||
| common.h | ||
| ggml-cann.cpp | ||