Jan Wassenberg
|
5844e6a1e5
|
Cleanup: add wrapper functions and rename vars to interleaved
Simplifies the TransformerLayer function.
Use interleaved* instead of _and_queries.
PiperOrigin-RevId: 653929449
|
2024-07-19 02:04:11 -07:00 |
Jan Wassenberg
|
12016d31c3
|
Major Prefill/Generate cleanup, 1.3x Prefill speedup
This fixes TTFT, which was not including prefill.
PiperOrigin-RevId: 653690626
|
2024-07-18 11:16:46 -07:00 |
Daniel Keysers
|
e87e65ca45
|
Add scale parameter to MatMul.
Add accessor to CompressedArray that asserts the scale is 1 and use it.
PiperOrigin-RevId: 653604840
|
2024-07-18 06:58:56 -07:00 |
Jan Wassenberg
|
992a2cbbc0
|
De-templatize Activations, add RowVectorBatch class
Also remove most kBatchSize args.
PiperOrigin-RevId: 653185525
|
2024-07-17 04:38:15 -07:00 |
Daniel Keysers
|
ff34370aac
|
Simplify FFW by using MatMul_4x4_Batch_Add.
Affects only the griffin model, where prefill TPS improves by about 70%.
PiperOrigin-RevId: 652878176
|
2024-07-16 09:41:23 -07:00 |
Kan Wu
|
f519ab6693
|
Refactor configurables.
PiperOrigin-RevId: 651259154
|
2024-07-10 21:30:58 -07:00 |
Daniel Keysers
|
063bbaa683
|
Add more comments to attention computation (and some small restructuring).
PiperOrigin-RevId: 650929097
|
2024-07-10 02:39:07 -07:00 |
Jan Wassenberg
|
c7c3daa624
|
7x compile time speedup: shard gemma.cc
Use overloaded functions defined in gemma/instantiations.
Also split out activations.h.
PiperOrigin-RevId: 649053122
|
2024-07-03 06:35:04 -07:00 |