Commit Graph

84 Commits

Author SHA1 Message Date
Jan Wassenberg 3e2396f98c Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc
accept_token: allow default, check if empty when using
allow mixing sample_func and stream_func, call the latter after the former
Also fix missing includes/deps.
PiperOrigin-RevId: 642240012
2024-06-11 05:53:10 -07:00
Daniel Keysers c557ad23a8 Adds simple-loop versions of missing batched functions.
PiperOrigin-RevId: 642189741
2024-06-11 02:14:02 -07:00
Jan Wassenberg c7f5e93136 Update benchmark with internal init
PiperOrigin-RevId: 641929308
2024-06-10 09:35:16 -07:00
Copybara-Service 49d814b519 Merge pull request #224 from szabadka:cleanup
PiperOrigin-RevId: 641922102
2024-06-10 09:11:13 -07:00
Jan Wassenberg c1c6714ad4 Internal experiment
PiperOrigin-RevId: 641915024
2024-06-10 08:46:10 -07:00
Zoltan Szabadka a3a75b77f9 Use CompressedWeights<TConfig<float>> in backpropagation.
kWeightsAreCompressed are removed and LoadRawWeights is moved
to compress_weights.cc
2024-06-10 14:34:24 +00:00
Phil Culliton c5bcb5438c Fix for transpose matrix creation and additional tests
PiperOrigin-RevId: 641868053
2024-06-10 05:24:04 -07:00
Jan Wassenberg 36e6915e18 Add CPU output, error if not C++17, simplify tokenizer ctor
PiperOrigin-RevId: 641850879
2024-06-10 04:01:11 -07:00
Phil Culliton d985d8b867 Shifting large matrix init to heap in ops_test.cc
PiperOrigin-RevId: 641311100
2024-06-07 11:38:42 -07:00
Jan Wassenberg f9b390b134 Support all weight types in a single binary.
This changes the command line flags, but the default value retains the previous behavior.

Also add a CreateGemma helper to enable extra args without interface changes.

PiperOrigin-RevId: 641266411
2024-06-07 09:04:45 -07:00
Copybara-Service 24db2ff725 Merge pull request #217 from szabadka:cross-entropy
PiperOrigin-RevId: 641241133
2024-06-07 07:17:35 -07:00
Daniel Keysers 06f814fc8b Small code cleanup suggestions while reading the code.
PiperOrigin-RevId: 641220788
2024-06-07 05:33:17 -07:00
Zoltan Szabadka 465998d25a Add support for custom sampling function to runtime config.
With this addition the ComputeCrossEntropy function can be moved
to its own library, because now we can compute it using only the
public API functions from gemma.h
2024-06-07 11:45:07 +00:00
Copybara-Service f7ac7092d6 Merge pull request #212 from szabadka:adam2
PiperOrigin-RevId: 641182573
2024-06-07 02:25:18 -07:00
Zoltan Szabadka c004799cdc Add Adam optimizer.
Drive-by: Fix compilation errors and tests for backprop functions.
2024-06-06 18:41:36 +00:00
Jan Wassenberg 12707ade80 Toward only using compressed weights:
CompressedLayer should all be f32 when weights are f32.

PiperOrigin-RevId: 640954519
2024-06-06 11:00:23 -07:00
Paul Chang 6c0be20fa6 Fix Softmax on SVE
PiperOrigin-RevId: 640947138
2024-06-06 10:39:30 -07:00
The gemma.cpp Authors 39d4115717 Implement mixed mode matmul: f32 * bf16
PiperOrigin-RevId: 640940962
2024-06-06 10:21:46 -07:00
Jan Wassenberg 57c2cd8b52 Simplifications: remove GemmaInterface and GemmaImpl
Split common and weights into separate lib
Remove common-inl (does not have to be SIMD code), activations.cc
Centralize switch(Model) to avoid duplication
Move CompressWeightsT to compress_weights.cc
Move LoadWeights to weights.cc

PiperOrigin-RevId: 640869202
2024-06-06 05:54:21 -07:00
Jan Wassenberg 5c3e5f7038 Remove no longer required stats.h - use Highway version instead
PiperOrigin-RevId: 640440379
2024-06-05 01:37:48 -07:00
Paul Chang 175e389c3c revert back to HWY_ASSERT for lane constraints, qualify hn::Add
PiperOrigin-RevId: 640193239
2024-06-04 10:10:18 -07:00
Phil Culliton e71d82ead9 Fix for GenerateZeroMat call in TestTiledMatMul
PiperOrigin-RevId: 640180868
2024-06-04 09:32:23 -07:00
Zelalem Aweke 9e213b3d96 Use system topology to pin threads across clusters.
PiperOrigin-RevId: 640151974
2024-06-04 07:50:32 -07:00
Jan Wassenberg 4f9155d8c6 Add bf16 matmul support, update naming+test
Avoid int32, which can easily overflow for large matrices.
Also fix IDE warning in sfp-inl.

PiperOrigin-RevId: 640149845
2024-06-04 07:41:46 -07:00
Zoltan Szabadka df01700b54 Move the backpropagation code to its own directory 2024-06-04 10:20:16 +00:00
Zoltan Szabadka 3b4fa4a0e3 Use HWY_EXPORT_AND_DYNAMIC_DISPATCH_T where possible. 2024-06-04 09:18:56 +00:00
Zoltan Szabadka 8567978541 Adress review comments 2024-06-04 08:37:54 +00:00
Zoltan Szabadka 7e639856da Fix compilation and tests for gcc 2024-06-04 08:37:54 +00:00
Zoltan Szabadka 36e4d8bbfe Add first version of backpropagation support.
This is still in progress / experimental, currently it is only
implemented for normal gemma MQA attention layers, and no
parallelism is added yet for backward pass.

Since we need to remember all activations from all layers, the
forward pass was also reimplemented with a new activation data
structure.
2024-06-04 08:37:49 +00:00
Paul Chang ed8f39c058 Refactor GemmaImpl dispatch to use Highway 1.2's HWY_DYNAMIC_DISPATCH_T
PiperOrigin-RevId: 639793810
2024-06-03 08:32:29 -07:00
Jan Wassenberg a44cbdadc2 Update to Highway 1.2 for topology/VQSelect
Also fix unused-warning in compress-inl.

PiperOrigin-RevId: 639116915
2024-05-31 12:29:10 -07:00
Paul Chang 5feacf120c static_assert shape constraints in MatMul 4x4
PiperOrigin-RevId: 639069345
2024-05-31 10:02:45 -07:00
Phil Culliton c616abe628 Unrolled / tiled 4x4 MatMul
PiperOrigin-RevId: 638384686
2024-05-29 13:02:35 -07:00
Paul Chang 419dc34ed5 Generic MHA/MQA/GQA implementation
PiperOrigin-RevId: 636937885
2024-05-24 09:05:53 -07:00
Zoltan Szabadka 542ad0973a Fix normalization in Softmax function. 2024-05-24 08:58:31 +00:00
Apoorv Reddy 1aaf3b3aae Documenting the RoPE implementation.
PiperOrigin-RevId: 636175297
2024-05-22 08:26:29 -07:00
Apoorv Reddy 7f4b85d00b Add MMLU eval to github
PiperOrigin-RevId: 635495178
2024-05-20 10:20:53 -07:00
Paul Chang 82623bdc7f Refer to --weights rather than --compressed_weights to simplify CLI docs
PiperOrigin-RevId: 634391135
2024-05-16 07:51:49 -07:00
Apoorv Reddy 8e641eb4cd Add TTFT to TimingInfo
PiperOrigin-RevId: 634378994
2024-05-16 07:16:53 -07:00
Apoorv Reddy eb0b96e0a8 Pass most runtime parameters using const RuntimeConfig&
PiperOrigin-RevId: 633572507
2024-05-14 07:04:53 -07:00
Apoorv Reddy f1eab987d8 Store tokens/sec in auxiliary struct TimingInfo.
PiperOrigin-RevId: 633108908
2024-05-13 00:04:19 -07:00
Jan Wassenberg 22fe9809ac Fix SVE build: add missing hn::
PiperOrigin-RevId: 632481097
2024-05-10 06:49:26 -07:00
Jan Wassenberg c5c9fc300c Enable even/odd for SFP. Refs #166
Disable it for float32 because there is not enough benefit.

PiperOrigin-RevId: 631788326
2024-05-08 07:09:06 -07:00
Paul Chang bacba351d4 Support additional scaling
PiperOrigin-RevId: 631429113
2024-05-07 08:16:25 -07:00
Jan Wassenberg f6d02b2870 Fix RecurrentGemma (refs #166) - one Dot was ignoring scale.
Remove extra Dot() overload
MatVecAdd always adds, use MatVecT<kAdd> if conditional.
Remove ununsed MatVecAddLoop and MatVecLoop
No longer tsan-verify even_odd

PiperOrigin-RevId: 631377279
2024-05-07 04:40:42 -07:00
Copybara-Service 8ed22e52bf Merge pull request #177 from szabadka:gemma2
PiperOrigin-RevId: 630388843
2024-05-03 07:52:27 -07:00
Zoltan Szabadka 19017fdb6d Fix expression in DASSERT() 2024-05-03 13:54:20 +00:00
Phil Culliton 28ca001d5e Matmul and test functions
PiperOrigin-RevId: 630373984
2024-05-03 06:39:36 -07:00
Zoltan Szabadka 429eb78512 Remove unused vars. 2024-05-03 13:37:17 +00:00
Zoltan Szabadka 3d72f17261 Use more parallelism in attention block in prefill mode.
Move the loop over the tokens inside the attention block and
then create kHeads * num_tokens threads.

This helps the multi-threaded speed only in case of the 2b gemma
model, but to be consistent we move the loop over the tokens inside
the griffin recurrent layer and the FFW layer as well. This is
also a preparation for using the MatMul operation later.

Benchmark results (summarization with 1600 tokens for prefill
and essay writing with 500 tokens for generation):

```
                   Prefill speed
Num threads      BEFORE       AFTER
32               61.76 t/s    65.08 t/s
64               89.46 t/s    98.62 t/s
```
2024-05-03 13:23:07 +00:00