gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Phil Culliton	1982a6ba00	Internal change PiperOrigin-RevId: 657831926	2024-07-30 20:24:54 -07:00
Jan Wassenberg	aaf51898b6	Major revamp #2 of Prefill: fix token order, parallel for multi-query - Allocate only the required KV caches and activation batch size - Add flags for batch sizes - Const-correct interface: Span of const int. - Also clean up the KVCache arg to a span. - Move kPrefillBatchSize into RuntimeConfig and remove related global constants. PiperOrigin-RevId: 655893197	2024-07-25 03:28:55 -07:00
Jan Wassenberg	12016d31c3	Major Prefill/Generate cleanup, 1.3x Prefill speedup This fixes TTFT, which was not including prefill. PiperOrigin-RevId: 653690626	2024-07-18 11:16:46 -07:00
Daniel Keysers	5a751a9a44	Update gemma-27b to the correct query scaling. PiperOrigin-RevId: 653201646	2024-07-17 05:43:52 -07:00
Jan Wassenberg	cd530374b3	Further 1.02x prefill speedup from batch 64->512 Measured on SKX. Larger speedup expected for Zen4/SPR. PiperOrigin-RevId: 652472928	2024-07-15 07:26:10 -07:00
The gemma.cpp Authors	c879133a5a	Increase the prefill batch size to 64. PiperOrigin-RevId: 651754772	2024-07-12 06:28:37 -07:00
Kan Wu	f519ab6693	Refactor configurables. PiperOrigin-RevId: 651259154	2024-07-10 21:30:58 -07:00
Jan Wassenberg	c7c3daa624	7x compile time speedup: shard gemma.cc Use overloaded functions defined in gemma/instantiations. Also split out activations.h. PiperOrigin-RevId: 649053122	2024-07-03 06:35:04 -07:00
Jan Wassenberg	09a7e75ead	Prep for sharding gemma.cc: split into kv_cache, tokenizer. Move activations.h to backprop/ to make space for another activations.h. PiperOrigin-RevId: 648744500	2024-07-02 09:31:06 -07:00
Paul Chang	8ac5d66575	Introduce new Gemma 9B and 27B configs PiperOrigin-RevId: 647299080	2024-06-27 06:45:24 -07:00
The gemma.cpp Authors	2228055bb8	Internal change. PiperOrigin-RevId: 643330703	2024-06-14 06:53:41 -07:00
Jan Wassenberg	d3c6a45b59	Major duplicated code reduction in test/benchmarks Helper functions to tokenize/wrap Move LayersOutputFunc into RuntimeConfig AcceptFunc passes the probability Implement StringFromType using the parser, and verify results match PiperOrigin-RevId: 643255119	2024-06-14 00:16:25 -07:00
Jan Wassenberg	c15ff9529c	Reduce duplication in Config* by inheriting no-SSM PiperOrigin-RevId: 643030629	2024-06-13 09:48:56 -07:00
Jan Wassenberg	f9b390b134	Support all weight types in a single binary. This changes the command line flags, but the default value retains the previous behavior. Also add a CreateGemma helper to enable extra args without interface changes. PiperOrigin-RevId: 641266411	2024-06-07 09:04:45 -07:00
Zoltan Szabadka	c004799cdc	Add Adam optimizer. Drive-by: Fix compilation errors and tests for backprop functions.	2024-06-06 18:41:36 +00:00
Jan Wassenberg	57c2cd8b52	Simplifications: remove GemmaInterface and GemmaImpl Split common and weights into separate lib Remove common-inl (does not have to be SIMD code), activations.cc Centralize switch(Model) to avoid duplication Move CompressWeightsT to compress_weights.cc Move LoadWeights to weights.cc PiperOrigin-RevId: 640869202	2024-06-06 05:54:21 -07:00
Zoltan Szabadka	36e4d8bbfe	Add first version of backpropagation support. This is still in progress / experimental, currently it is only implemented for normal gemma MQA attention layers, and no parallelism is added yet for backward pass. Since we need to remember all activations from all layers, the forward pass was also reimplemented with a new activation data structure.	2024-06-04 08:37:49 +00:00

17 Commits