Luca Versari
4c23932289
Improve weight handling.
...
- Allow scaling of SFP weights
- Allow using uncompressed weights
- Do not try to compress weights in the main model calls
- Reduce code duplication in weight handling with some macros
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Thomas Fischbacher <tfish@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-06 11:08:47 +02:00
RangerUFO
6923aec853
Add MQA support
2024-03-20 18:17:24 +08:00
RangerUFO
130e1f678f
Adjust vocab size to be the same as gemma_pytorch
2024-03-20 18:17:24 +08:00
RangerUFO
83ec42954f
Allow changing k parameter of `SampleTopK` as a compiler flag
2024-03-13 13:55:37 +08:00
austinvhuang
9cdc9223bc
clean up formatting after 129e66ada2, add .clang-format defaults, minor updates to DEVELOPERS doc
2024-02-27 14:22:02 -05:00
Dan Zheng
afc354dcb1
Import from GitHub.
...
PiperOrigin-RevId: 610595796
2024-02-26 19:05:11 -08:00
Dan Zheng
8db89304bd
No public description
...
PiperOrigin-RevId: 610498969
2024-02-26 19:03:48 -08:00
austinvhuang
129e66ada2
Reduce KV cache preallocation to 4096 and make it comptime configurable, add rm build note in readme, add note on comptime options in DEVELOPERS, make multiturn=0 the default
2024-02-26 17:05:32 -05:00
The gemma_cpp Authors
587e80f276
Code update
...
PiperOrigin-RevId: 609394329
2024-02-22 09:19:47 -08:00
Austin Huang
e29cd566cf
initial commit
2024-02-21 03:31:22 +00:00