* vulkan: Use larger workgroups for mul_mat_vec when M is small Also use subgroup instructions for (part of) the reduction when supported. Without this, the more expensive reductions would eat into the benefits of the larger workgroups. * update heuristic for amd/intel Co-authored-by: 0cc4m <picard12@live.de> --------- Co-authored-by: 0cc4m <picard12@live.de> |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||