llama.cpp/ggml
hipudding c1792d58b5 ggml: cann: add graph_optimize for multi-stream parallel preparation
Implement ggml_backend_cann_graph_optimize function for CANN backend,
ported from Vulkan backend (PR #15489 and #15850).

Key changes:
- Add graph optimization to reorder nodes based on dependency analysis
- Group non-dependent nodes together for potential parallel execution
- Preserve fusion patterns (RMS_NORM+MUL, MUL_MAT+ADD, ADD+RMS_NORM)
- Add GGML_CANN_DISABLE_GRAPH_OPTIMIZE env var to disable optimization

This is the first step toward multi-stream parallel execution on Ascend NPU.
2026-02-04 08:37:57 +00:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include ggml-virtgpu: make the code thread safe (#19204) 2026-02-04 10:46:18 +08:00
src ggml: cann: add graph_optimize for multi-stream parallel preparation 2026-02-04 08:37:57 +00:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt Bump cmake max version (needed for Windows on Snapdragon builds) (#19188) 2026-02-01 14:13:38 -08:00