Commit Graph

434 Commits

Author SHA1 Message Date
The gemma.cpp Authors 0e612d9a20 Split out common parts (embedder and transformer block) from Prefill() and Transformer() into separate functions.
PiperOrigin-RevId: 644455520
2024-06-18 11:24:56 -07:00
Paul Chang d7d9d14f0e Move kGriffinLayers into ConfigNoSSM, set kGemmaLayers directly
For regular (non-SSM) Gemma models, kGriffinLayers is by definition always zero
and kGemmaLayers is just the number of layers.

PiperOrigin-RevId: 644384531
2024-06-18 07:52:52 -07:00
Jan Wassenberg 70506b0a62 Fix debug_prompt and other binaries (internal init)
PiperOrigin-RevId: 644367683
2024-06-18 06:48:59 -07:00
Jan Wassenberg 15135f5b3d Simplify Attention.
Shared kMHA, reuse from Activations,
inline Attn lambda, use QDim as the stride between successive Q.

PiperOrigin-RevId: 644343854
2024-06-18 05:08:12 -07:00
Jan Wassenberg 2ac47e4a06 Fix Py binding/run_example: use GemmaEnv
PiperOrigin-RevId: 644318962
2024-06-18 03:20:22 -07:00
Jan Wassenberg a07f60c9a1 1.15x 7b sfp prefill speedup: Matmul in attention
2b bf16:
prefill 114.456 -> 115.222
decode  16.8847 -> 16.9987

7b sfp:
prefill 18.8575 -> 21.7325
decode 5.68428 -> 5.79791

PiperOrigin-RevId: 644283676
2024-06-18 01:00:51 -07:00
Jan Wassenberg 355f7b4f80 Update developer docs and mention asan/msan
PiperOrigin-RevId: 644000220
2024-06-17 07:29:12 -07:00
Jan Wassenberg 704d936764 Further simplification to ForEachTensor, thanks I.K.
PiperOrigin-RevId: 643996210
2024-06-17 07:12:26 -07:00
Jan Wassenberg 7d0720675f Move raw_weights into separate header, used mainly by compress_weights.
Fix warnings in backprop/* (include)

PiperOrigin-RevId: 643983136
2024-06-17 06:17:02 -07:00
Jan Wassenberg ad790d89d1 Fix DASSERT - TiledBatch requires at least 2 vectors.
Also use shorthand for weight types.

PiperOrigin-RevId: 643958371
2024-06-17 04:29:01 -07:00
The gemma.cpp Authors 7dbfa44794 Refactor CompressedWeights.
PiperOrigin-RevId: 643934198
2024-06-17 02:54:54 -07:00
Ray Smith e0afdfa8fb Added bias vector addition to MatMul
PiperOrigin-RevId: 643385381
2024-06-14 10:25:16 -07:00
The gemma.cpp Authors 2228055bb8 Internal change.
PiperOrigin-RevId: 643330703
2024-06-14 06:53:41 -07:00
Jan Wassenberg 29c0c574e6 Integrate matmul into FFW: 4.3x prefill speedup
```
before, bf16:
27.2929 prefill tokens / sec
17.2114 tokens / sec

after, bf16
116.496 prefill tokens / sec
17.5391 tokens / sec
```

PiperOrigin-RevId: 643328437
2024-06-14 06:32:26 -07:00
Ray Smith 198326a682 Removed now redundant non-batch matmul
PiperOrigin-RevId: 643317187
2024-06-14 05:13:36 -07:00
Andrey Vlasov b17631c95f Implement a missing (bf16, f32) tiled MatMul kernel.
PiperOrigin-RevId: 643313676
2024-06-14 04:54:40 -07:00
Jan Wassenberg d3c6a45b59 Major duplicated code reduction in test/benchmarks
Helper functions to tokenize/wrap
Move LayersOutputFunc into RuntimeConfig
AcceptFunc passes the probability
Implement StringFromType using the parser, and verify results match

PiperOrigin-RevId: 643255119
2024-06-14 00:16:25 -07:00
Jan Wassenberg c15ff9529c Reduce duplication in Config* by inheriting no-SSM
PiperOrigin-RevId: 643030629
2024-06-13 09:48:56 -07:00
Ray Smith ea525da967 Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time.
PiperOrigin-RevId: 643017973
2024-06-13 09:05:40 -07:00
The gemma.cpp Authors 1b40619864 Increase parallelism in ops_test
PiperOrigin-RevId: 643013415
2024-06-13 08:50:41 -07:00
Andrey Vlasov bf78a065e1 Make gemma/ops_test `large`.
PiperOrigin-RevId: 642923146
2024-06-13 03:33:46 -07:00
Andrey Vlasov 38eb452b94 Support mixed (bf16, sfp) tiled MatMul. Same sfp-decompress strategy as in (f32,
sfp) tiled MatMul.

PiperOrigin-RevId: 642901844
2024-06-13 02:07:21 -07:00
Daniel Keysers 6e67a6d8a9 Tiny cleanup: distinguish between "ids" and "pieces" in argument names when encoding.
PiperOrigin-RevId: 642614278
2024-06-12 07:52:13 -07:00
Daniel Keysers 1ac9857014 Extends Transformer() to prepare for batched processing.
PiperOrigin-RevId: 642603025
2024-06-12 07:01:03 -07:00
The gemma.cpp Authors 2a0e6ee976 Fix numerical issue in Softcap by subtracting max.
Also update test threshold.

PiperOrigin-RevId: 642587468
2024-06-12 05:42:16 -07:00
Copybara-Service e37447cfe2 Merge pull request #234 from szabadka:build-fix
PiperOrigin-RevId: 642551103
2024-06-12 02:29:21 -07:00
Zoltan Szabadka d98523187c Add benchmark dependency to cmake build. 2024-06-12 08:14:29 +00:00
The gemma.cpp Authors f467670de7 Implement float * SfpStream matmul by decompressing 4 * kColsA_RowsB -sized chunks of the second matrix.
PiperOrigin-RevId: 642533996
2024-06-12 01:11:59 -07:00
Zoltan Szabadka 9c869c4655 Revert "Add benchmark dependency to cmake build"
This reverts commit 12ce91a163.

Reason: accidentally pushed directly to dev branch, will redo with
a PR and copybara-import.
2024-06-12 07:56:03 +00:00
Zoltan Szabadka 12ce91a163 Add benchmark dependency to cmake build 2024-06-12 07:09:15 +00:00
Ray Smith bdf33c7008 Updated benchmarks.cc to recent changes to Gemma API.
PiperOrigin-RevId: 642285902
2024-06-11 08:55:40 -07:00
Phil Culliton b6565e3bf6 Update AssertClose for large matrices and add large matrix test
PiperOrigin-RevId: 642277221
2024-06-11 08:22:47 -07:00
Daniel Keysers 8ec8eef524 Add internal initialization code to debug_prompt.
PiperOrigin-RevId: 642276350
2024-06-11 08:19:38 -07:00
The gemma.cpp Authors 57d0ea95d0 Add buildcleaner: keep pragma to a dep in ops_test build rule and run build_cleaner.
PiperOrigin-RevId: 642275845
2024-06-11 08:17:47 -07:00
Jan Wassenberg 3e2396f98c Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc
accept_token: allow default, check if empty when using
allow mixing sample_func and stream_func, call the latter after the former
Also fix missing includes/deps.
PiperOrigin-RevId: 642240012
2024-06-11 05:53:10 -07:00
Jan Wassenberg a0e808e341 Add compression/ comments, especially on SFP range
PiperOrigin-RevId: 642238720
2024-06-11 05:47:49 -07:00
Daniel Keysers c557ad23a8 Adds simple-loop versions of missing batched functions.
PiperOrigin-RevId: 642189741
2024-06-11 02:14:02 -07:00
Jan Wassenberg c7f5e93136 Update benchmark with internal init
PiperOrigin-RevId: 641929308
2024-06-10 09:35:16 -07:00
Copybara-Service 49d814b519 Merge pull request #224 from szabadka:cleanup
PiperOrigin-RevId: 641922102
2024-06-10 09:11:13 -07:00
Jan Wassenberg c1c6714ad4 Internal experiment
PiperOrigin-RevId: 641915024
2024-06-10 08:46:10 -07:00
Zoltan Szabadka 6ca4a8e345 Address review comments 2024-06-10 15:27:22 +00:00
Zoltan Szabadka a3a75b77f9 Use CompressedWeights<TConfig<float>> in backpropagation.
kWeightsAreCompressed are removed and LoadRawWeights is moved
to compress_weights.cc
2024-06-10 14:34:24 +00:00
Jan Wassenberg 95fd7263ae Add missing test deps
PiperOrigin-RevId: 641880024
2024-06-10 06:22:07 -07:00
Phil Culliton c5bcb5438c Fix for transpose matrix creation and additional tests
PiperOrigin-RevId: 641868053
2024-06-10 05:24:04 -07:00
Jan Wassenberg 36e6915e18 Add CPU output, error if not C++17, simplify tokenizer ctor
PiperOrigin-RevId: 641850879
2024-06-10 04:01:11 -07:00
The gemma.cpp Authors 020db5a67d No public description
PiperOrigin-RevId: 641816837
2024-06-10 01:12:42 -07:00
Phil Culliton d985d8b867 Shifting large matrix init to heap in ops_test.cc
PiperOrigin-RevId: 641311100
2024-06-07 11:38:42 -07:00
Jan Wassenberg f9b390b134 Support all weight types in a single binary.
This changes the command line flags, but the default value retains the previous behavior.

Also add a CreateGemma helper to enable extra args without interface changes.

PiperOrigin-RevId: 641266411
2024-06-07 09:04:45 -07:00
Copybara-Service 24db2ff725 Merge pull request #217 from szabadka:cross-entropy
PiperOrigin-RevId: 641241133
2024-06-07 07:17:35 -07:00
Daniel Keysers 06f814fc8b Small code cleanup suggestions while reading the code.
PiperOrigin-RevId: 641220788
2024-06-07 05:33:17 -07:00