Commit Graph

7 Commits

Author SHA1 Message Date
Krzysztof Rymski 8a5e37eeb7 Updates to tests to use kv_transcodign library to reduce theris code size
PiperOrigin-RevId: 888600365
2026-03-24 05:06:01 -07:00
Ray Smith bea8b1cdbd Replaced attention in ViT with flash - 8x speedup of image tokenizer on AMD
PiperOrigin-RevId: 880877209
2026-03-09 08:46:04 -07:00
Krzysztof Rymski 029cfd0b33 Int8 + microscaling support for kv cache formats.
Right now multiplication is done by converting to corresponding float format.
Can yield up to 2x improvements for membw constrained shapes

PiperOrigin-RevId: 880748493
2026-03-09 02:50:08 -07:00
Ray Smith 49cb438b1e Rollback of erroneous rollback.
PiperOrigin-RevId: 877376165
2026-03-02 06:50:26 -08:00
The gemma.cpp Authors a3d994915f No public description
PiperOrigin-RevId: 877333188
2026-03-02 04:32:29 -08:00
Ray Smith 16c1b29b89 Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
PiperOrigin-RevId: 877308306
2026-03-02 03:11:01 -08:00
Krzysztof Rymski df162ead7c Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.
It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512

PiperOrigin-RevId: 874517319
2026-02-24 03:26:49 -08:00