- Replace tensor-pointer-based dependency tracking with memory-address-based tracking - Use std::map<void*, int> to track pending writes per stream - Implement smart stream selection: - No dependencies: round-robin distribution - Single dependency: execute on same stream (avoid sync overhead) - Multiple dependencies: sync all streams - Add WAW (Write-After-Write) hazard detection - Fix output corruption issue when using multi-stream execution Enable with: GGML_CANN_MULTI_STREAM=1 |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||