Commit Graph

10 Commits

Author SHA1 Message Date
Luca Versari 4c23932289 Improve weight handling.
- Allow scaling of SFP weights
- Allow using uncompressed weights
- Do not try to compress weights in the main model calls
- Reduce code duplication in weight handling with some macros

Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Thomas Fischbacher <tfish@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-06 11:08:47 +02:00
RangerUFO 6923aec853 Add MQA support 2024-03-20 18:17:24 +08:00
RangerUFO 130e1f678f Adjust vocab size to be the same as gemma_pytorch 2024-03-20 18:17:24 +08:00
RangerUFO 83ec42954f Allow changing k parameter of `SampleTopK` as a compiler flag 2024-03-13 13:55:37 +08:00
austinvhuang 9cdc9223bc clean up formatting after 129e66ada2, add .clang-format defaults, minor updates to DEVELOPERS doc 2024-02-27 14:22:02 -05:00
Dan Zheng afc354dcb1 Import from GitHub.
PiperOrigin-RevId: 610595796
2024-02-26 19:05:11 -08:00
Dan Zheng 8db89304bd No public description
PiperOrigin-RevId: 610498969
2024-02-26 19:03:48 -08:00
austinvhuang 129e66ada2 Reduce KV cache preallocation to 4096 and make it comptime configurable, add rm build note in readme, add note on comptime options in DEVELOPERS, make multiturn=0 the default 2024-02-26 17:05:32 -05:00
The gemma_cpp Authors 587e80f276 Code update
PiperOrigin-RevId: 609394329
2024-02-22 09:19:47 -08:00
Austin Huang e29cd566cf initial commit 2024-02-21 03:31:22 +00:00