Ray Smith
|
7b55d41f46
|
Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
PiperOrigin-RevId: 868146247
|
2026-02-13 01:58:48 -08:00 |