- Remove dead code: _math and _naive variants are no longer needed - Rename _batched to the public entry point ggml_cann_gated_delta_net - In supports_op, return false for non-contiguous / GQA / non-F32 cases so the framework falls back to CPU instead of running the slow naive path - The single remaining implementation uses aclnnBatchMatMul over all H heads per timestep, reducing kernel launches to O(n_seqs * n_tokens) |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||