Commit Graph

25 Commits

Author SHA1 Message Date
Daniel Keysers 62c70d6715 Rename ModelTraining to PromptWrapping which is a more accurate name.
PiperOrigin-RevId: 705881500
2024-12-13 07:45:59 -08:00
Daniel Keysers c6384574db Fix PaliGemma's GenerateImageTokensT().
Move image related config values from LayerConfig to ModelConfig.
Minor changes: Add a few comments, remove gcpp:: qualification where it wasn't needed in a few places, define local constants in VitAttention.DotSoftmaxWeightedSum()

PiperOrigin-RevId: 687210519
2024-10-18 01:34:13 -07:00
Ray Smith 0d68555f87 Eliminated TConfig.
Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively.
Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly.
Adjusted WeightsWrapper and ForwardLayer etc to match.
The only remaining template arg is the weight type.
This enables all the instantiations to be deleted, apart from one per type.
It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately.
Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively.

PiperOrigin-RevId: 686870060
2024-10-17 05:04:22 -07:00
Jan Wassenberg 6ab3ff5bde Minor cleanup, Windows+Bazel build fixes
add app.h comment
compress-inl: remove unused typedef
gemma-inl: add missing HWY_ATTR and cast
separate sum-inl.h and basics.h headers
replace more hwy::bfloat16_t with BF16
update include pragmas
update dot_test thresholds
update Highway version in Bazel for HWY_RCAST_ALIGNED fix
PiperOrigin-RevId: 684464326
2024-10-10 09:05:06 -07:00
Ray Smith 85958f5fd3 Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray.
Definition of array size is moved to the constructor.
Allocation is separate and parallelized.
All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted.
Replaced all previous ForEachTensor functions with a single unified function.

PiperOrigin-RevId: 684451604
2024-10-10 08:22:30 -07:00
Daniel Keysers f8835fe4a4 Add support for PaliGemma Vision-LM (224x224) to gemma.cpp
See https://arxiv.org/abs/2407.07726 for a description of the model.
Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that.

PiperOrigin-RevId: 677841119
2024-09-23 10:09:38 -07:00
Jan Wassenberg 301dc8067a Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul
Supports converting all weight/activation formats to native MulT (bf16/f32)

Also:
- ConstMat/MutableMat for const correctness
- Move RowVectorBatch to allocator.h so it can be used from Matmul
- Add matmul.h so MatMulEnv can be used from Activations
- Remove kMaxThreads, detect from PerClusterPools
- Build fix: -inl.h files must be textual_hdrs, and highway.h should precede -inl.h

```
zen4 new
64, 24576, 3072, add=0, MatTA=bf16, MatTB=sfp:   616.6 GFLOPS.
64, 3072, 24576, add=0, MatTA=bf16, MatTB=sfp:   460.7 GFLOPS.
64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp:    598.6 GFLOPS.
64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp:    435.6 GFLOPS.

zen4 old
64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp:    257.5 GFLOPS.
64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp:    231.9 GFLOPS.
```

PiperOrigin-RevId: 663729812
2024-08-16 07:52:20 -07:00
Apoorv Reddy fd1b0743a7 Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B.
This is to make it clear that these models are part of the Gemma2 family of models.

PiperOrigin-RevId: 661181682
2024-08-09 02:09:06 -07:00
Phil Culliton 1982a6ba00 Internal change
PiperOrigin-RevId: 657831926
2024-07-30 20:24:54 -07:00
Jan Wassenberg aaf51898b6 Major revamp #2 of Prefill: fix token order, parallel for multi-query
- Allocate only the required KV caches and activation batch size
- Add flags for batch sizes
- Const-correct interface: Span of const int.
- Also clean up the KVCache arg to a span.
- Move kPrefillBatchSize into RuntimeConfig and remove related global constants.

PiperOrigin-RevId: 655893197
2024-07-25 03:28:55 -07:00
Jan Wassenberg 12016d31c3 Major Prefill/Generate cleanup, 1.3x Prefill speedup
This fixes TTFT, which was not including prefill.

PiperOrigin-RevId: 653690626
2024-07-18 11:16:46 -07:00
Daniel Keysers 5a751a9a44 Update gemma-27b to the correct query scaling.
PiperOrigin-RevId: 653201646
2024-07-17 05:43:52 -07:00
Jan Wassenberg cd530374b3 Further 1.02x prefill speedup from batch 64->512
Measured on SKX. Larger speedup expected for Zen4/SPR.

PiperOrigin-RevId: 652472928
2024-07-15 07:26:10 -07:00
The gemma.cpp Authors c879133a5a Increase the prefill batch size to 64.
PiperOrigin-RevId: 651754772
2024-07-12 06:28:37 -07:00
Kan Wu f519ab6693 Refactor configurables.
PiperOrigin-RevId: 651259154
2024-07-10 21:30:58 -07:00
Jan Wassenberg c7c3daa624 7x compile time speedup: shard gemma.cc
Use overloaded functions defined in gemma/instantiations.
Also split out activations.h.

PiperOrigin-RevId: 649053122
2024-07-03 06:35:04 -07:00
Jan Wassenberg 09a7e75ead Prep for sharding gemma.cc: split into kv_cache, tokenizer.
Move activations.h to backprop/ to make space for another activations.h.

PiperOrigin-RevId: 648744500
2024-07-02 09:31:06 -07:00
Paul Chang 8ac5d66575 Introduce new Gemma 9B and 27B configs
PiperOrigin-RevId: 647299080
2024-06-27 06:45:24 -07:00
The gemma.cpp Authors 2228055bb8 Internal change.
PiperOrigin-RevId: 643330703
2024-06-14 06:53:41 -07:00
Jan Wassenberg d3c6a45b59 Major duplicated code reduction in test/benchmarks
Helper functions to tokenize/wrap
Move LayersOutputFunc into RuntimeConfig
AcceptFunc passes the probability
Implement StringFromType using the parser, and verify results match

PiperOrigin-RevId: 643255119
2024-06-14 00:16:25 -07:00
Jan Wassenberg c15ff9529c Reduce duplication in Config* by inheriting no-SSM
PiperOrigin-RevId: 643030629
2024-06-13 09:48:56 -07:00
Jan Wassenberg f9b390b134 Support all weight types in a single binary.
This changes the command line flags, but the default value retains the previous behavior.

Also add a CreateGemma helper to enable extra args without interface changes.

PiperOrigin-RevId: 641266411
2024-06-07 09:04:45 -07:00
Zoltan Szabadka c004799cdc Add Adam optimizer.
Drive-by: Fix compilation errors and tests for backprop functions.
2024-06-06 18:41:36 +00:00
Jan Wassenberg 57c2cd8b52 Simplifications: remove GemmaInterface and GemmaImpl
Split common and weights into separate lib
Remove common-inl (does not have to be SIMD code), activations.cc
Centralize switch(Model) to avoid duplication
Move CompressWeightsT to compress_weights.cc
Move LoadWeights to weights.cc

PiperOrigin-RevId: 640869202
2024-06-06 05:54:21 -07:00
Zoltan Szabadka 36e4d8bbfe Add first version of backpropagation support.
This is still in progress / experimental, currently it is only
implemented for normal gemma MQA attention layers, and no
parallelism is added yet for backward pass.

Since we need to remember all activations from all layers, the
forward pass was also reimplemented with a new activation data
structure.
2024-06-04 08:37:49 +00:00