Ray Smith
6098a022b3
Increased parallelism for RMSNormAndPositionalEncoding
...
PiperOrigin-RevId: 813738994
2025-10-01 07:11:14 -07:00
Ray Smith
2f6cbde8ff
Added a smaller tile size to flash attention for smaller batch sizes
...
PiperOrigin-RevId: 813226193
2025-09-30 05:49:20 -07:00
Ray Smith
4974f24832
Fixed bug with softcap in single flash attention
...
PiperOrigin-RevId: 813164938
2025-09-30 02:17:58 -07:00
Ray Smith
c9b8479f7d
Added zero-initialization to att_out.
...
Re-enabled flash attention when HWY_NATIVE_DOT_BF16 is not available.
PiperOrigin-RevId: 806284756
2025-09-12 07:48:23 -07:00
Ray Smith
f10ac41a20
Added flash attention, with both a single-q function, and a register-tiled function.
...
The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine.
PiperOrigin-RevId: 804913784
2025-09-09 08:05:26 -07:00