Jan Wassenberg
e5c81f64a1
Major refactor: clarify query_idx (global) vs qi. Refs #607
...
Fix missing pos increment for last prefill and check that in gemma_test.
Thanks to @ufownl for pointing this out.
Change argument lists to QBatch with accessors.
Increase default seq_len to 8k.
PiperOrigin-RevId: 771937385
2025-06-16 02:42:02 -07:00
Jan Wassenberg
01cdefeda7
1.64x batch=1 prefill speedup: nested parallelization for Attention
...
(DotSoftmaxWeightedSum)
Also fix tsan error in matmul (atomic_flag instead of static)
PiperOrigin-RevId: 770241705
2025-06-11 11:28:46 -07:00
Jan Wassenberg
c027a45a2e
MatPtr-ify KV, shared div_seq_len, --seq_len flag
...
PiperOrigin-RevId: 770194455
2025-06-11 09:49:38 -07:00
Jan Wassenberg
6ee628ba38
Further cleanup: separate MatMulEnv arg
...
move row_ptrs into MatMulEnv
Consistent arg order: layer, activations, kv_cache, env
PiperOrigin-RevId: 767886386
2025-06-05 20:48:32 -07:00
Jan Wassenberg
3a266c662c
Split gemma-inl into separate source files
...
weights, mat: zero-initialize padding, required since the MatMul "avoid B decompress" optimization.
PiperOrigin-RevId: 767562313
2025-06-05 05:36:44 -07:00