* CUDA: add bf16 and f32 support to cublas_mul_mat_batched * Review: add type traits and make function more generic * Review: make check more explicit, add back comments, and fix formatting * Review: fix formatting, remove useless type conversion, fix naming for bools |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||