Commit Graph

16 Commits

Author SHA1 Message Date
Jan Wassenberg 02ce1e344f Use NestedPools, add NUMA infra
Improved threading.h, fix thread counts for single package/cluster systems
Temporarily forces to a single socket. Prefill 29.28 tps, decode 6.92.

Also fix benchmarks.cc build, update tensor allocator to Allocator

PiperOrigin-RevId: 687307167
2024-10-18 08:11:18 -07:00
Ray Smith 0d68555f87 Eliminated TConfig.
Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively.
Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly.
Adjusted WeightsWrapper and ForwardLayer etc to match.
The only remaining template arg is the weight type.
This enables all the instantiations to be deleted, apart from one per type.
It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately.
Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively.

PiperOrigin-RevId: 686870060
2024-10-17 05:04:22 -07:00
Ray Smith 85958f5fd3 Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray.
Definition of array size is moved to the constructor.
Allocation is separate and parallelized.
All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted.
Replaced all previous ForEachTensor functions with a single unified function.

PiperOrigin-RevId: 684451604
2024-10-10 08:22:30 -07:00
Jan Wassenberg 7d9fcda0d8 -467ms startup: parallel Reshape
Also split Softmax into Argmax helper, add comments;
add profiler zones + fix IDE warning

PiperOrigin-RevId: 680954573
2024-10-01 04:11:35 -07:00
Jan Wassenberg 897f902d28 Fix include order, required to build with profiler enabled
PiperOrigin-RevId: 680574177
2024-09-30 07:52:50 -07:00
Jan Wassenberg b831fa8482 1.3x prefill, 0.95x decode: matmul replacing last matvec
Before 38.28, 9.17 (with profiler enabled, prompt = 330 tok)
```
Gen.FFW                                 :      15414 x         4692352 = 24.166318
Gen.Attention.SumHeads                  :      15414 x         1394804 =  7.183451 !!
Gen.Embedding                           :        361 x        49961894 =  6.026297
Gen.Attention.QKV                       :      15414 x         1005125 =  5.176546
Gen.Attention.DotSoftmax                :      15414 x          885480 =  4.560357
RopeAndMulBy                            :     696528 x           11867 =  2.761818
```

After 49.80, 8.68
```
Gen.FFW                                 :      14448 x         5312783 = 25.646868
Gen.Embedding                           :        338 x        63044815 =  7.119845
Gen.Attention.QKV                       :      14448 x         1115003 =  5.382557
Gen.Attention.DotSoftmax                :      14448 x          897577 =  4.332957
RopeAndMulBy                            :     673344 x           11886 =  2.674156
Gen.Attention.SumHeads                  :      14448 x          518291 =  2.501993 !!
```
PiperOrigin-RevId: 662024085
2024-08-12 03:36:01 -07:00
Daniel Keysers e87e65ca45 Add scale parameter to MatMul.
Add accessor to CompressedArray that asserts the scale is 1 and use it.

PiperOrigin-RevId: 653604840
2024-07-18 06:58:56 -07:00
Jan Wassenberg 704d936764 Further simplification to ForEachTensor, thanks I.K.
PiperOrigin-RevId: 643996210
2024-06-17 07:12:26 -07:00
Jan Wassenberg 7d0720675f Move raw_weights into separate header, used mainly by compress_weights.
Fix warnings in backprop/* (include)

PiperOrigin-RevId: 643983136
2024-06-17 06:17:02 -07:00
The gemma.cpp Authors 7dbfa44794 Refactor CompressedWeights.
PiperOrigin-RevId: 643934198
2024-06-17 02:54:54 -07:00
Zoltan Szabadka a3a75b77f9 Use CompressedWeights<TConfig<float>> in backpropagation.
kWeightsAreCompressed are removed and LoadRawWeights is moved
to compress_weights.cc
2024-06-10 14:34:24 +00:00
Jan Wassenberg f9b390b134 Support all weight types in a single binary.
This changes the command line flags, but the default value retains the previous behavior.

Also add a CreateGemma helper to enable extra args without interface changes.

PiperOrigin-RevId: 641266411
2024-06-07 09:04:45 -07:00
Zoltan Szabadka c004799cdc Add Adam optimizer.
Drive-by: Fix compilation errors and tests for backprop functions.
2024-06-06 18:41:36 +00:00
Jan Wassenberg 57c2cd8b52 Simplifications: remove GemmaInterface and GemmaImpl
Split common and weights into separate lib
Remove common-inl (does not have to be SIMD code), activations.cc
Centralize switch(Model) to avoid duplication
Move CompressWeightsT to compress_weights.cc
Move LoadWeights to weights.cc

PiperOrigin-RevId: 640869202
2024-06-06 05:54:21 -07:00
Zoltan Szabadka 8567978541 Adress review comments 2024-06-04 08:37:54 +00:00
Zoltan Szabadka 36e4d8bbfe Add first version of backpropagation support.
This is still in progress / experimental, currently it is only
implemented for normal gemma MQA attention layers, and no
parallelism is added yet for backward pass.

Since we need to remember all activations from all layers, the
forward pass was also reimplemented with a new activation data
structure.
2024-06-04 08:37:49 +00:00