Jan Wassenberg
2ac47e4a06
Fix Py binding/run_example: use GemmaEnv
...
PiperOrigin-RevId: 644318962
2024-06-18 03:20:22 -07:00
Jan Wassenberg
a07f60c9a1
1.15x 7b sfp prefill speedup: Matmul in attention
...
2b bf16:
prefill 114.456 -> 115.222
decode 16.8847 -> 16.9987
7b sfp:
prefill 18.8575 -> 21.7325
decode 5.68428 -> 5.79791
PiperOrigin-RevId: 644283676
2024-06-18 01:00:51 -07:00
Jan Wassenberg
704d936764
Further simplification to ForEachTensor, thanks I.K.
...
PiperOrigin-RevId: 643996210
2024-06-17 07:12:26 -07:00
Jan Wassenberg
7d0720675f
Move raw_weights into separate header, used mainly by compress_weights.
...
Fix warnings in backprop/* (include)
PiperOrigin-RevId: 643983136
2024-06-17 06:17:02 -07:00
Jan Wassenberg
ad790d89d1
Fix DASSERT - TiledBatch requires at least 2 vectors.
...
Also use shorthand for weight types.
PiperOrigin-RevId: 643958371
2024-06-17 04:29:01 -07:00
The gemma.cpp Authors
7dbfa44794
Refactor CompressedWeights.
...
PiperOrigin-RevId: 643934198
2024-06-17 02:54:54 -07:00
Ray Smith
e0afdfa8fb
Added bias vector addition to MatMul
...
PiperOrigin-RevId: 643385381
2024-06-14 10:25:16 -07:00
The gemma.cpp Authors
2228055bb8
Internal change.
...
PiperOrigin-RevId: 643330703
2024-06-14 06:53:41 -07:00
Jan Wassenberg
29c0c574e6
Integrate matmul into FFW: 4.3x prefill speedup
...
```
before, bf16:
27.2929 prefill tokens / sec
17.2114 tokens / sec
after, bf16
116.496 prefill tokens / sec
17.5391 tokens / sec
```
PiperOrigin-RevId: 643328437
2024-06-14 06:32:26 -07:00
Ray Smith
198326a682
Removed now redundant non-batch matmul
...
PiperOrigin-RevId: 643317187
2024-06-14 05:13:36 -07:00
Andrey Vlasov
b17631c95f
Implement a missing (bf16, f32) tiled MatMul kernel.
...
PiperOrigin-RevId: 643313676
2024-06-14 04:54:40 -07:00
Jan Wassenberg
d3c6a45b59
Major duplicated code reduction in test/benchmarks
...
Helper functions to tokenize/wrap
Move LayersOutputFunc into RuntimeConfig
AcceptFunc passes the probability
Implement StringFromType using the parser, and verify results match
PiperOrigin-RevId: 643255119
2024-06-14 00:16:25 -07:00
Jan Wassenberg
c15ff9529c
Reduce duplication in Config* by inheriting no-SSM
...
PiperOrigin-RevId: 643030629
2024-06-13 09:48:56 -07:00
Ray Smith
ea525da967
Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time.
...
PiperOrigin-RevId: 643017973
2024-06-13 09:05:40 -07:00
The gemma.cpp Authors
1b40619864
Increase parallelism in ops_test
...
PiperOrigin-RevId: 643013415
2024-06-13 08:50:41 -07:00
Andrey Vlasov
38eb452b94
Support mixed (bf16, sfp) tiled MatMul. Same sfp-decompress strategy as in (f32,
...
sfp) tiled MatMul.
PiperOrigin-RevId: 642901844
2024-06-13 02:07:21 -07:00
Daniel Keysers
6e67a6d8a9
Tiny cleanup: distinguish between "ids" and "pieces" in argument names when encoding.
...
PiperOrigin-RevId: 642614278
2024-06-12 07:52:13 -07:00
Daniel Keysers
1ac9857014
Extends Transformer() to prepare for batched processing.
...
PiperOrigin-RevId: 642603025
2024-06-12 07:01:03 -07:00
The gemma.cpp Authors
2a0e6ee976
Fix numerical issue in Softcap by subtracting max.
...
Also update test threshold.
PiperOrigin-RevId: 642587468
2024-06-12 05:42:16 -07:00
The gemma.cpp Authors
f467670de7
Implement float * SfpStream matmul by decompressing 4 * kColsA_RowsB -sized chunks of the second matrix.
...
PiperOrigin-RevId: 642533996
2024-06-12 01:11:59 -07:00
Ray Smith
bdf33c7008
Updated benchmarks.cc to recent changes to Gemma API.
...
PiperOrigin-RevId: 642285902
2024-06-11 08:55:40 -07:00
Phil Culliton
b6565e3bf6
Update AssertClose for large matrices and add large matrix test
...
PiperOrigin-RevId: 642277221
2024-06-11 08:22:47 -07:00
Jan Wassenberg
3e2396f98c
Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc
...
accept_token: allow default, check if empty when using
allow mixing sample_func and stream_func, call the latter after the former
Also fix missing includes/deps.
PiperOrigin-RevId: 642240012
2024-06-11 05:53:10 -07:00
Daniel Keysers
c557ad23a8
Adds simple-loop versions of missing batched functions.
...
PiperOrigin-RevId: 642189741
2024-06-11 02:14:02 -07:00
Jan Wassenberg
c7f5e93136
Update benchmark with internal init
...
PiperOrigin-RevId: 641929308
2024-06-10 09:35:16 -07:00
Copybara-Service
49d814b519
Merge pull request #224 from szabadka:cleanup
...
PiperOrigin-RevId: 641922102
2024-06-10 09:11:13 -07:00
Jan Wassenberg
c1c6714ad4
Internal experiment
...
PiperOrigin-RevId: 641915024
2024-06-10 08:46:10 -07:00
Zoltan Szabadka
a3a75b77f9
Use CompressedWeights<TConfig<float>> in backpropagation.
...
kWeightsAreCompressed are removed and LoadRawWeights is moved
to compress_weights.cc
2024-06-10 14:34:24 +00:00
Phil Culliton
c5bcb5438c
Fix for transpose matrix creation and additional tests
...
PiperOrigin-RevId: 641868053
2024-06-10 05:24:04 -07:00
Jan Wassenberg
36e6915e18
Add CPU output, error if not C++17, simplify tokenizer ctor
...
PiperOrigin-RevId: 641850879
2024-06-10 04:01:11 -07:00
Phil Culliton
d985d8b867
Shifting large matrix init to heap in ops_test.cc
...
PiperOrigin-RevId: 641311100
2024-06-07 11:38:42 -07:00
Jan Wassenberg
f9b390b134
Support all weight types in a single binary.
...
This changes the command line flags, but the default value retains the previous behavior.
Also add a CreateGemma helper to enable extra args without interface changes.
PiperOrigin-RevId: 641266411
2024-06-07 09:04:45 -07:00
Copybara-Service
24db2ff725
Merge pull request #217 from szabadka:cross-entropy
...
PiperOrigin-RevId: 641241133
2024-06-07 07:17:35 -07:00
Daniel Keysers
06f814fc8b
Small code cleanup suggestions while reading the code.
...
PiperOrigin-RevId: 641220788
2024-06-07 05:33:17 -07:00
Zoltan Szabadka
465998d25a
Add support for custom sampling function to runtime config.
...
With this addition the ComputeCrossEntropy function can be moved
to its own library, because now we can compute it using only the
public API functions from gemma.h
2024-06-07 11:45:07 +00:00
Copybara-Service
f7ac7092d6
Merge pull request #212 from szabadka:adam2
...
PiperOrigin-RevId: 641182573
2024-06-07 02:25:18 -07:00
Zoltan Szabadka
c004799cdc
Add Adam optimizer.
...
Drive-by: Fix compilation errors and tests for backprop functions.
2024-06-06 18:41:36 +00:00
Jan Wassenberg
12707ade80
Toward only using compressed weights:
...
CompressedLayer should all be f32 when weights are f32.
PiperOrigin-RevId: 640954519
2024-06-06 11:00:23 -07:00
Paul Chang
6c0be20fa6
Fix Softmax on SVE
...
PiperOrigin-RevId: 640947138
2024-06-06 10:39:30 -07:00
The gemma.cpp Authors
39d4115717
Implement mixed mode matmul: f32 * bf16
...
PiperOrigin-RevId: 640940962
2024-06-06 10:21:46 -07:00
Jan Wassenberg
57c2cd8b52
Simplifications: remove GemmaInterface and GemmaImpl
...
Split common and weights into separate lib
Remove common-inl (does not have to be SIMD code), activations.cc
Centralize switch(Model) to avoid duplication
Move CompressWeightsT to compress_weights.cc
Move LoadWeights to weights.cc
PiperOrigin-RevId: 640869202
2024-06-06 05:54:21 -07:00
Jan Wassenberg
5c3e5f7038
Remove no longer required stats.h - use Highway version instead
...
PiperOrigin-RevId: 640440379
2024-06-05 01:37:48 -07:00
Paul Chang
175e389c3c
revert back to HWY_ASSERT for lane constraints, qualify hn::Add
...
PiperOrigin-RevId: 640193239
2024-06-04 10:10:18 -07:00
Phil Culliton
e71d82ead9
Fix for GenerateZeroMat call in TestTiledMatMul
...
PiperOrigin-RevId: 640180868
2024-06-04 09:32:23 -07:00
Zelalem Aweke
9e213b3d96
Use system topology to pin threads across clusters.
...
PiperOrigin-RevId: 640151974
2024-06-04 07:50:32 -07:00
Jan Wassenberg
4f9155d8c6
Add bf16 matmul support, update naming+test
...
Avoid int32, which can easily overflow for large matrices.
Also fix IDE warning in sfp-inl.
PiperOrigin-RevId: 640149845
2024-06-04 07:41:46 -07:00
Zoltan Szabadka
df01700b54
Move the backpropagation code to its own directory
2024-06-04 10:20:16 +00:00
Zoltan Szabadka
3b4fa4a0e3
Use HWY_EXPORT_AND_DYNAMIC_DISPATCH_T where possible.
2024-06-04 09:18:56 +00:00
Zoltan Szabadka
8567978541
Adress review comments
2024-06-04 08:37:54 +00:00
Zoltan Szabadka
7e639856da
Fix compilation and tests for gcc
2024-06-04 08:37:54 +00:00