gemma.cpp

Author	SHA1	Message	Date
Luca Versari	4c23932289	Improve weight handling. - Allow scaling of SFP weights - Allow using uncompressed weights - Do not try to compress weights in the main model calls - Reduce code duplication in weight handling with some macros Co-authored-by: Eugene Kliuchnikov <eustas@google.com> Co-authored-by: Thomas Fischbacher <tfish@google.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-06 11:08:47 +02:00
RangerUFO	6923aec853	Add MQA support	2024-03-20 18:17:24 +08:00
RangerUFO	130e1f678f	Adjust vocab size to be the same as gemma_pytorch	2024-03-20 18:17:24 +08:00
RangerUFO	83ec42954f	Allow changing k parameter of `SampleTopK` as a compiler flag	2024-03-13 13:55:37 +08:00
austinvhuang	9cdc9223bc	clean up formatting after `129e66ada2`, add .clang-format defaults, minor updates to DEVELOPERS doc	2024-02-27 14:22:02 -05:00
Dan Zheng	afc354dcb1	Import from GitHub. PiperOrigin-RevId: 610595796	2024-02-26 19:05:11 -08:00
Dan Zheng	8db89304bd	No public description PiperOrigin-RevId: 610498969	2024-02-26 19:03:48 -08:00
austinvhuang	129e66ada2	Reduce KV cache preallocation to 4096 and make it comptime configurable, add rm build note in readme, add note on comptime options in DEVELOPERS, make multiturn=0 the default	2024-02-26 17:05:32 -05:00
The gemma_cpp Authors	587e80f276	Code update PiperOrigin-RevId: 609394329	2024-02-22 09:19:47 -08:00
Austin Huang	e29cd566cf	initial commit	2024-02-21 03:31:22 +00:00