- Implement 4x4 warptile tuning for Mali-G720/Immortalis MC12. - Optimize tuning parameters for ARM Mali and Qualcomm Adreno. - Fix matrix multiplication out-of-bounds (OOB) access by moving restrictions to initialization. - Ensure stability by removing risky subgroup size clamping on Qualcomm devices. |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||