llama.cpp

History

Patrick Buckley db9d8aa428 ggml-cuda: native bf16 flash attention for vec kernel (#20525 ) * ggml-cuda: native bf16 flash attention for vec and tile kernels mma kernel still converts bf16 to fp16 before launch, native mma bf16 todo * ggml-cuda: address code owner review feedback reverted tile kernel changes to avoid larger refactor * fix ci failures on turing and hip * fix bf16 vec kernel compile on hip v_dot2 platforms * add comments --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2026-03-22 11:05:51 +01:00
..
CMakeLists.txt	ggml-cuda: native bf16 flash attention for vec kernel (#20525 )	2026-03-22 11:05:51 +01:00

ggml-cuda: native bf16 flash attention for vec kernel (#20525 )

* ggml-cuda: native bf16 flash attention for vec and tile kernels

mma kernel still converts bf16 to fp16 before launch, native mma bf16 todo

* ggml-cuda: address code owner review feedback

reverted tile kernel changes to avoid larger refactor

* fix ci failures on turing and hip

* fix bf16 vec kernel compile on hip v_dot2 platforms

* add comments

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

2026-03-22 11:05:51 +01:00

CMakeLists.txt

ggml-cuda: native bf16 flash attention for vec kernel (#20525 )

2026-03-22 11:05:51 +01:00