Commit Graph

3 Commits

Author SHA1 Message Date
Ray Smith 7b55d41f46 Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
PiperOrigin-RevId: 868146247
2026-02-13 01:58:48 -08:00
Martin Stolle 49d420aeaf Add some comments.
PiperOrigin-RevId: 834173319
2025-11-19 01:09:15 -08:00
Ray Smith 8a100c1e8d Added access to flash attention internals to TileFlashAttention4
PiperOrigin-RevId: 826011137
2025-10-30 06:50:05 -07:00