Ray Smith
14244664c8
Avoid transposing Q when it isn't needed
...
PiperOrigin-RevId: 814187984
2025-10-02 05:16:35 -07:00
Jan Wassenberg
fe5a39990e
Improve FlashAttention threading:
...
kFlat for RMSNorm (hierarchical is excessive),
profiler zone naming improvements.
PiperOrigin-RevId: 814144012
2025-10-02 02:37:05 -07:00
Ray Smith
6098a022b3
Increased parallelism for RMSNormAndPositionalEncoding
...
PiperOrigin-RevId: 813738994
2025-10-01 07:11:14 -07:00
Ray Smith
2f6cbde8ff
Added a smaller tile size to flash attention for smaller batch sizes
...
PiperOrigin-RevId: 813226193
2025-09-30 05:49:20 -07:00
Ray Smith
4974f24832
Fixed bug with softcap in single flash attention
...
PiperOrigin-RevId: 813164938
2025-09-30 02:17:58 -07:00
Ray Smith
c9b8479f7d
Added zero-initialization to att_out.
...
Re-enabled flash attention when HWY_NATIVE_DOT_BF16 is not available.
PiperOrigin-RevId: 806284756
2025-09-12 07:48:23 -07:00
Ray Smith
f10ac41a20
Added flash attention, with both a single-q function, and a register-tiled function.
...
The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine.
PiperOrigin-RevId: 804913784
2025-09-09 08:05:26 -07:00