llama.cpp/ggml/src/ggml-musa
Patrick Buckley db9d8aa428
ggml-cuda: native bf16 flash attention for vec kernel (#20525)
* ggml-cuda: native bf16 flash attention for vec and tile kernels

mma kernel still converts bf16 to fp16 before launch, native mma bf16 todo

* ggml-cuda: address code owner review feedback

reverted tile kernel changes to avoid larger refactor

* fix ci failures on turing and hip

* fix bf16 vec kernel compile on hip v_dot2 platforms

* add comments

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-22 11:05:51 +01:00
..
CMakeLists.txt ggml-cuda: native bf16 flash attention for vec kernel (#20525) 2026-03-22 11:05:51 +01:00
mudnn.cu musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647) 2025-05-21 09:58:49 +08:00
mudnn.cuh musa: enable fp16 mma (all) and cublas on qy2 (#13842) 2025-06-26 12:11:59 +08:00