Commit Graph

4 Commits

Author SHA1 Message Date
Ray Smith a814aa411e Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
PiperOrigin-RevId: 868146247
2026-02-16 03:55:43 -08:00
Ray Smith 76d7951242 Added wheat_from_chaff_test to test the ability of a model to find a needle in a haystack of data.
Replaced flag with attention_impl to control which attention to run.

PiperOrigin-RevId: 869694868
2026-02-13 06:05:30 -08:00
Balazs Racz baa69dfb78 Makes the entire runtime_config passed into the activations constructor.
PiperOrigin-RevId: 845153671
2025-12-16 01:56:52 -08:00
Martin Stolle 1014ae9e2a Adding a simple test for GemmaAttention
PiperOrigin-RevId: 842135414
2025-12-09 02:13:03 -08:00