Commit Graph

7 Commits

Author SHA1 Message Date
Ray Smith 14244664c8 Avoid transposing Q when it isn't needed
PiperOrigin-RevId: 814187984
2025-10-02 05:16:35 -07:00
Jan Wassenberg fe5a39990e Improve FlashAttention threading:
kFlat for RMSNorm (hierarchical is excessive),
profiler zone naming improvements.

PiperOrigin-RevId: 814144012
2025-10-02 02:37:05 -07:00
Ray Smith 6098a022b3 Increased parallelism for RMSNormAndPositionalEncoding
PiperOrigin-RevId: 813738994
2025-10-01 07:11:14 -07:00
Ray Smith 2f6cbde8ff Added a smaller tile size to flash attention for smaller batch sizes
PiperOrigin-RevId: 813226193
2025-09-30 05:49:20 -07:00
Ray Smith 4974f24832 Fixed bug with softcap in single flash attention
PiperOrigin-RevId: 813164938
2025-09-30 02:17:58 -07:00
Ray Smith c9b8479f7d Added zero-initialization to att_out.
Re-enabled flash attention when HWY_NATIVE_DOT_BF16 is not available.

PiperOrigin-RevId: 806284756
2025-09-12 07:48:23 -07:00
Ray Smith f10ac41a20 Added flash attention, with both a single-q function, and a register-tiled function.
The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine.

PiperOrigin-RevId: 804913784
2025-09-09 08:05:26 -07:00