Lower GDN_CHUNK_THRESHOLD from UINT32_MAX to 2 and prefer the coopmat output pipeline (cm1) when available, falling back to the scalar variant. PP-512: ~206 → ~210 t/s on Radeon 890M (RDNA3.5). |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||
Lower GDN_CHUNK_THRESHOLD from UINT32_MAX to 2 and prefer the coopmat output pipeline (cm1) when available, falling back to the scalar variant. PP-512: ~206 → ~210 t/s on Radeon 890M (RDNA3.5). |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||