Commit Graph

4 Commits

Author SHA1 Message Date
theo77186 622cd010ff
ggml: CUDA: add head size 72 for flash-attn (#16962) 2025-11-03 14:29:11 +01:00
Johannes Gäßler 7049736b2d
CUDA: fix numerical issues in tile FA kernel (#16540) 2025-10-13 17:29:45 +03:00
Johannes Gäßler 11f0af5504
CUDA: faster tile FA, add oob checks, more HSs (#16492) 2025-10-11 20:54:32 +02:00
Johannes Gäßler 79bc429262
CUDA: faster tile FA (Pascal/AMD), headsize 256 (#15769) 2025-09-07 00:26:28 +02:00