Commit Graph

4 Commits

Author SHA1 Message Date
Ray Smith 49cb438b1e Rollback of erroneous rollback.
PiperOrigin-RevId: 877376165
2026-03-02 06:50:26 -08:00
The gemma.cpp Authors a3d994915f No public description
PiperOrigin-RevId: 877333188
2026-03-02 04:32:29 -08:00
Ray Smith 16c1b29b89 Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
PiperOrigin-RevId: 877308306
2026-03-02 03:11:01 -08:00
Krzysztof Rymski df162ead7c Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.
It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512

PiperOrigin-RevId: 874517319
2026-02-24 03:26:49 -08:00