Johannes Gäßler
|
e95d0bc8fd
|
CUDA: fix FA VKQ accumulator overflow (#17746)
|
2025-12-05 09:18:10 +01:00 |
Johannes Gäßler
|
2e1c9cd814
|
CUDA: generalized (mma) FA, add Volta support (#17505)
* CUDA: generalized (mma) FA, add Volta support
* use struct for MMA FA kernel config
---------
Co-authored-by: Aman Gupta <aman>
|
2025-12-03 16:57:05 +01:00 |
R0CKSTAR
|
c6f7a423c8
|
[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551)
* [MUSA] enable fp16/fast_fp16/bf16_mma on PH1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Update ggml/src/ggml-cuda/fattn-vec.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/fattn-vec.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Update ggml/src/ggml-cuda/fattn-tile.cuh
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* Address review comments
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
|
2025-11-28 14:08:29 +01:00 |
theo77186
|
622cd010ff
|
ggml: CUDA: add head size 72 for flash-attn (#16962)
|
2025-11-03 14:29:11 +01:00 |
Johannes Gäßler
|
7049736b2d
|
CUDA: fix numerical issues in tile FA kernel (#16540)
|
2025-10-13 17:29:45 +03:00 |
Johannes Gäßler
|
11f0af5504
|
CUDA: faster tile FA, add oob checks, more HSs (#16492)
|
2025-10-11 20:54:32 +02:00 |
Johannes Gäßler
|
79bc429262
|
CUDA: faster tile FA (Pascal/AMD), headsize 256 (#15769)
|
2025-09-07 00:26:28 +02:00 |