Ray Smith
|
a814aa411e
|
Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
PiperOrigin-RevId: 868146247
|
2026-02-16 03:55:43 -08:00 |
Ray Smith
|
76d7951242
|
Added wheat_from_chaff_test to test the ability of a model to find a needle in a haystack of data.
Replaced flag with attention_impl to control which attention to run.
PiperOrigin-RevId: 869694868
|
2026-02-13 06:05:30 -08:00 |
Balazs Racz
|
baa69dfb78
|
Makes the entire runtime_config passed into the activations constructor.
PiperOrigin-RevId: 845153671
|
2025-12-16 01:56:52 -08:00 |
Martin Stolle
|
1014ae9e2a
|
Adding a simple test for GemmaAttention
PiperOrigin-RevId: 842135414
|
2025-12-09 02:13:03 -08:00 |