Commit Graph

3 Commits

Author SHA1 Message Date
Krzysztof Rymski 2344488566 Internal changes
PiperOrigin-RevId: 889294548
2026-03-25 09:46:12 -07:00
Krzysztof Rymski f56d18dd68 Improvements to inference using int8 compressed kv's
Multiplication is done using int16*int16 multiplication instructions avoid expensive conversion to f32/bf16
x2 speed on zen3

PiperOrigin-RevId: 888690192
2026-03-24 08:51:30 -07:00
Krzysztof Rymski df162ead7c Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.
It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512

PiperOrigin-RevId: 874517319
2026-02-24 03:26:49 -08:00