Jan Wassenberg
|
01cdefeda7
|
1.64x batch=1 prefill speedup: nested parallelization for Attention
(DotSoftmaxWeightedSum)
Also fix tsan error in matmul (atomic_flag instead of static)
PiperOrigin-RevId: 770241705
|
2025-06-11 11:28:46 -07:00 |
Jan Wassenberg
|
c027a45a2e
|
MatPtr-ify KV, shared div_seq_len, --seq_len flag
PiperOrigin-RevId: 770194455
|
2025-06-11 09:49:38 -07:00 |
Jan Wassenberg
|
6ee628ba38
|
Further cleanup: separate MatMulEnv arg
move row_ptrs into MatMulEnv
Consistent arg order: layer, activations, kv_cache, env
PiperOrigin-RevId: 767886386
|
2025-06-05 20:48:32 -07:00 |
Jan Wassenberg
|
3a266c662c
|
Split gemma-inl into separate source files
weights, mat: zero-initialize padding, required since the MatMul "avoid B decompress" optimization.
PiperOrigin-RevId: 767562313
|
2025-06-05 05:36:44 -07:00 |