Commit Graph

621 Commits

Author SHA1 Message Date
Jan Wassenberg 29c0c574e6 Integrate matmul into FFW: 4.3x prefill speedup
```
before, bf16:
27.2929 prefill tokens / sec
17.2114 tokens / sec

after, bf16
116.496 prefill tokens / sec
17.5391 tokens / sec
```

PiperOrigin-RevId: 643328437
2024-06-14 06:32:26 -07:00
Ray Smith 198326a682 Removed now redundant non-batch matmul
PiperOrigin-RevId: 643317187
2024-06-14 05:13:36 -07:00
Andrey Vlasov b17631c95f Implement a missing (bf16, f32) tiled MatMul kernel.
PiperOrigin-RevId: 643313676
2024-06-14 04:54:40 -07:00
Jan Wassenberg d3c6a45b59 Major duplicated code reduction in test/benchmarks
Helper functions to tokenize/wrap
Move LayersOutputFunc into RuntimeConfig
AcceptFunc passes the probability
Implement StringFromType using the parser, and verify results match

PiperOrigin-RevId: 643255119
2024-06-14 00:16:25 -07:00
Jan Wassenberg c15ff9529c Reduce duplication in Config* by inheriting no-SSM
PiperOrigin-RevId: 643030629
2024-06-13 09:48:56 -07:00
Ray Smith ea525da967 Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time.
PiperOrigin-RevId: 643017973
2024-06-13 09:05:40 -07:00
The gemma.cpp Authors 1b40619864 Increase parallelism in ops_test
PiperOrigin-RevId: 643013415
2024-06-13 08:50:41 -07:00
Andrey Vlasov bf78a065e1 Make gemma/ops_test `large`.
PiperOrigin-RevId: 642923146
2024-06-13 03:33:46 -07:00
Andrey Vlasov 38eb452b94 Support mixed (bf16, sfp) tiled MatMul. Same sfp-decompress strategy as in (f32,
sfp) tiled MatMul.

PiperOrigin-RevId: 642901844
2024-06-13 02:07:21 -07:00
Daniel Keysers 6e67a6d8a9 Tiny cleanup: distinguish between "ids" and "pieces" in argument names when encoding.
PiperOrigin-RevId: 642614278
2024-06-12 07:52:13 -07:00
Daniel Keysers 1ac9857014 Extends Transformer() to prepare for batched processing.
PiperOrigin-RevId: 642603025
2024-06-12 07:01:03 -07:00
The gemma.cpp Authors 2a0e6ee976 Fix numerical issue in Softcap by subtracting max.
Also update test threshold.

PiperOrigin-RevId: 642587468
2024-06-12 05:42:16 -07:00
Copybara-Service e37447cfe2 Merge pull request #234 from szabadka:build-fix
PiperOrigin-RevId: 642551103
2024-06-12 02:29:21 -07:00
Zoltan Szabadka d98523187c Add benchmark dependency to cmake build. 2024-06-12 08:14:29 +00:00
The gemma.cpp Authors f467670de7 Implement float * SfpStream matmul by decompressing 4 * kColsA_RowsB -sized chunks of the second matrix.
PiperOrigin-RevId: 642533996
2024-06-12 01:11:59 -07:00
Zoltan Szabadka 9c869c4655 Revert "Add benchmark dependency to cmake build"
This reverts commit 12ce91a163.

Reason: accidentally pushed directly to dev branch, will redo with
a PR and copybara-import.
2024-06-12 07:56:03 +00:00
Zoltan Szabadka 12ce91a163 Add benchmark dependency to cmake build 2024-06-12 07:09:15 +00:00
Ray Smith bdf33c7008 Updated benchmarks.cc to recent changes to Gemma API.
PiperOrigin-RevId: 642285902
2024-06-11 08:55:40 -07:00
Phil Culliton b6565e3bf6 Update AssertClose for large matrices and add large matrix test
PiperOrigin-RevId: 642277221
2024-06-11 08:22:47 -07:00
Daniel Keysers 8ec8eef524 Add internal initialization code to debug_prompt.
PiperOrigin-RevId: 642276350
2024-06-11 08:19:38 -07:00
The gemma.cpp Authors 57d0ea95d0 Add buildcleaner: keep pragma to a dep in ops_test build rule and run build_cleaner.
PiperOrigin-RevId: 642275845
2024-06-11 08:17:47 -07:00
Jan Wassenberg 3e2396f98c Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc
accept_token: allow default, check if empty when using
allow mixing sample_func and stream_func, call the latter after the former
Also fix missing includes/deps.
PiperOrigin-RevId: 642240012
2024-06-11 05:53:10 -07:00
Jan Wassenberg a0e808e341 Add compression/ comments, especially on SFP range
PiperOrigin-RevId: 642238720
2024-06-11 05:47:49 -07:00
Daniel Keysers c557ad23a8 Adds simple-loop versions of missing batched functions.
PiperOrigin-RevId: 642189741
2024-06-11 02:14:02 -07:00
Jan Wassenberg c7f5e93136 Update benchmark with internal init
PiperOrigin-RevId: 641929308
2024-06-10 09:35:16 -07:00
Copybara-Service 49d814b519 Merge pull request #224 from szabadka:cleanup
PiperOrigin-RevId: 641922102
2024-06-10 09:11:13 -07:00
Jan Wassenberg c1c6714ad4 Internal experiment
PiperOrigin-RevId: 641915024
2024-06-10 08:46:10 -07:00
Zoltan Szabadka 6ca4a8e345 Address review comments 2024-06-10 15:27:22 +00:00
Zoltan Szabadka a3a75b77f9 Use CompressedWeights<TConfig<float>> in backpropagation.
kWeightsAreCompressed are removed and LoadRawWeights is moved
to compress_weights.cc
2024-06-10 14:34:24 +00:00
Jan Wassenberg 95fd7263ae Add missing test deps
PiperOrigin-RevId: 641880024
2024-06-10 06:22:07 -07:00
Phil Culliton c5bcb5438c Fix for transpose matrix creation and additional tests
PiperOrigin-RevId: 641868053
2024-06-10 05:24:04 -07:00
Jan Wassenberg 36e6915e18 Add CPU output, error if not C++17, simplify tokenizer ctor
PiperOrigin-RevId: 641850879
2024-06-10 04:01:11 -07:00
The gemma.cpp Authors 020db5a67d No public description
PiperOrigin-RevId: 641816837
2024-06-10 01:12:42 -07:00
Phil Culliton d985d8b867 Shifting large matrix init to heap in ops_test.cc
PiperOrigin-RevId: 641311100
2024-06-07 11:38:42 -07:00
Jan Wassenberg f9b390b134 Support all weight types in a single binary.
This changes the command line flags, but the default value retains the previous behavior.

Also add a CreateGemma helper to enable extra args without interface changes.

PiperOrigin-RevId: 641266411
2024-06-07 09:04:45 -07:00
Copybara-Service 24db2ff725 Merge pull request #217 from szabadka:cross-entropy
PiperOrigin-RevId: 641241133
2024-06-07 07:17:35 -07:00
Daniel Keysers 06f814fc8b Small code cleanup suggestions while reading the code.
PiperOrigin-RevId: 641220788
2024-06-07 05:33:17 -07:00
Zoltan Szabadka 465998d25a Add support for custom sampling function to runtime config.
With this addition the ComputeCrossEntropy function can be moved
to its own library, because now we can compute it using only the
public API functions from gemma.h
2024-06-07 11:45:07 +00:00
Copybara-Service f7ac7092d6 Merge pull request #212 from szabadka:adam2
PiperOrigin-RevId: 641182573
2024-06-07 02:25:18 -07:00
Jan Wassenberg e3f4374e81 Fix fix for weight type define, refs #198
GEMMA_WEIGHT_T is indeed the correct flag for the C++ compiler,
but the readme references CMake, and there the correct flag name is WEIGHT_TYPE.

PiperOrigin-RevId: 641170380
2024-06-07 01:32:25 -07:00
Jan Wassenberg 8dc0e5ea83 Fix reference to GEMMA_WEIGHT_T. Refs #198
PiperOrigin-RevId: 641161403
2024-06-07 00:54:30 -07:00
Zoltan Szabadka c004799cdc Add Adam optimizer.
Drive-by: Fix compilation errors and tests for backprop functions.
2024-06-06 18:41:36 +00:00
Jan Wassenberg 12707ade80 Toward only using compressed weights:
CompressedLayer should all be f32 when weights are f32.

PiperOrigin-RevId: 640954519
2024-06-06 11:00:23 -07:00
Paul Chang 6c0be20fa6 Fix Softmax on SVE
PiperOrigin-RevId: 640947138
2024-06-06 10:39:30 -07:00
The gemma.cpp Authors 39d4115717 Implement mixed mode matmul: f32 * bf16
PiperOrigin-RevId: 640940962
2024-06-06 10:21:46 -07:00
Jan Wassenberg 57c2cd8b52 Simplifications: remove GemmaInterface and GemmaImpl
Split common and weights into separate lib
Remove common-inl (does not have to be SIMD code), activations.cc
Centralize switch(Model) to avoid duplication
Move CompressWeightsT to compress_weights.cc
Move LoadWeights to weights.cc

PiperOrigin-RevId: 640869202
2024-06-06 05:54:21 -07:00
Jan Wassenberg 5c3e5f7038 Remove no longer required stats.h - use Highway version instead
PiperOrigin-RevId: 640440379
2024-06-05 01:37:48 -07:00
Paul Chang 175e389c3c revert back to HWY_ASSERT for lane constraints, qualify hn::Add
PiperOrigin-RevId: 640193239
2024-06-04 10:10:18 -07:00
Phil Culliton e71d82ead9 Fix for GenerateZeroMat call in TestTiledMatMul
PiperOrigin-RevId: 640180868
2024-06-04 09:32:23 -07:00
Zelalem Aweke 9e213b3d96 Use system topology to pin threads across clusters.
PiperOrigin-RevId: 640151974
2024-06-04 07:50:32 -07:00