Anav Prasad
88458164c7
CUDA: Add Flash Attention Support for Head Dimension 512 ( #20998 )
...
* flash attention support for head dimension 512 added
* FA D=512 - match 576 configs, limit ncols2, revert vec cap
* fix HIP tile kernel build for D=512
* fix HIP tile kernel occupancy for D=512 on AMD
* Apply suggestions from code review
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* fix tile FA compilation
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-04-01 09:07:24 +02:00
Johannes Gäßler
c78e682245
CUDA: fix kernel selection logic for tile FA ( #19686 )
...
* CUDA: fix kernel selection logic for tile FA
* add comment
2026-02-19 12:42:58 +01:00
Aman Gupta
b70d251076
CUDA: add gqa_ratio 4 for GLM 4.7 flash ( #18953 )
2026-01-22 18:51:53 +08:00
Johannes Gäßler
5c662d21a3
CUDA: fix allignment on register spill for FA ( #18815 )
2026-01-15 15:14:50 +01:00
Johannes Gäßler
0cdce38a97
CUDA: fix FP16 overflow in tile FA kernel ( #17875 )
2025-12-09 09:34:02 +01:00
Johannes Gäßler
e95d0bc8fd
CUDA: fix FA VKQ accumulator overflow ( #17746 )
2025-12-05 09:18:10 +01:00
Johannes Gäßler
2e1c9cd814
CUDA: generalized (mma) FA, add Volta support ( #17505 )
...
* CUDA: generalized (mma) FA, add Volta support
* use struct for MMA FA kernel config
---------
Co-authored-by: Aman Gupta <aman>
2025-12-03 16:57:05 +01:00
R0CKSTAR
c6f7a423c8
[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 ( #17551 )
...
* [MUSA] enable fp16/fast_fp16/bf16_mma on PH1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Update ggml/src/ggml-cuda/fattn-vec.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/fattn-vec.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/fattn-tile.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Address review comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-11-28 14:08:29 +01:00
theo77186
622cd010ff
ggml: CUDA: add head size 72 for flash-attn ( #16962 )
2025-11-03 14:29:11 +01:00
Johannes Gäßler
7049736b2d
CUDA: fix numerical issues in tile FA kernel ( #16540 )
2025-10-13 17:29:45 +03:00
Johannes Gäßler
11f0af5504
CUDA: faster tile FA, add oob checks, more HSs ( #16492 )
2025-10-11 20:54:32 +02:00
Johannes Gäßler
79bc429262
CUDA: faster tile FA (Pascal/AMD), headsize 256 ( #15769 )
2025-09-07 00:26:28 +02:00