Jan Wassenberg
af8eb2fde3
Declutter gemma/ directory, move binaries to evals/ and util/.
...
PiperOrigin-RevId: 648400795
2024-07-01 09:51:04 -07:00
Jan Wassenberg
e588a7f45d
Add config for att/final cap, skip max-subtract. Fixes #278
...
Also update includes/deps for backprop/.
PiperOrigin-RevId: 648399222
2024-07-01 09:45:26 -07:00
The gemma.cpp Authors
da7507e6f0
Add prompt batching to Gemma.cpp.
...
This CL adds a new function to Gemma that allows for batching of multiple prompts. The function takes a vector of prompts and returns a vector of responses. The prompts are processed in parallel, and the responses are returned in the same order as the prompts.
PiperOrigin-RevId: 648367559
2024-07-01 07:51:31 -07:00
Paul Chang
8ac5d66575
Introduce new Gemma 9B and 27B configs
...
PiperOrigin-RevId: 647299080
2024-06-27 06:45:24 -07:00
Paul Chang
78e96fdc70
Refactor model type / training tables, simplify reverse mapping
...
PiperOrigin-RevId: 647069372
2024-06-26 13:59:14 -07:00
Paul Chang
aa57fc3952
Remove unused BUILD dependency
...
PiperOrigin-RevId: 646519547
2024-06-25 10:12:13 -07:00
The gemma.cpp Authors
7fc8ddf825
Fix a clang tidy warning
...
PiperOrigin-RevId: 646498062
2024-06-25 09:02:59 -07:00
The gemma.cpp Authors
ef786f1bfc
Use hwy::ThreadPool::MaxThreads() to determine the number of threads to use.
...
PiperOrigin-RevId: 646117298
2024-06-24 09:16:04 -07:00
The gemma.cpp Authors
12089417b5
Improve logging when running Gemma examples: fix the issue when max_tokens, max_generated_tokens and temperature were logging without any trailing space/newline.
...
PiperOrigin-RevId: 646014268
2024-06-24 02:00:34 -07:00
The gemma.cpp Authors
80b1347393
Skip the last RMSNormInplaceBatched in the Prefill phase.
...
That only modifies activations.x, but it is called with prefill_activations which are not used after the Prefill call.
PiperOrigin-RevId: 645391387
2024-06-21 08:04:22 -07:00
Copybara-Service
82f16087ba
Merge pull request #266 from ufownl:bugfix/kvcache
...
PiperOrigin-RevId: 645329504
2024-06-21 03:06:52 -07:00
Copybara-Service
c2efcb0da4
Merge pull request #267 from ufownl:bugfix/clang_ce
...
PiperOrigin-RevId: 645329422
2024-06-21 03:06:04 -07:00
RangerUFO
f7855251ea
Fix compilation errors in clang
...
It will occur in `ubuntu-latest` of GitHub Actions.
2024-06-21 13:40:40 +08:00
RangerUFO
d7787c8f6c
Fix KV cache size calculation error
2024-06-21 13:06:26 +08:00
Daniel Keysers
0570972d43
Fixing two typos.
...
PiperOrigin-RevId: 645103198
2024-06-20 11:33:12 -07:00
The gemma.cpp Authors
a85725614a
Refactor kCachePosSize and kCacheLayerSize into separate functors.
...
PiperOrigin-RevId: 645048519
2024-06-20 08:52:08 -07:00
Jan Wassenberg
48ebba8b7a
Code cleanup
...
- Simplify template arg list, enable deduction
- missing hn:: on " Lanes"
- 1.0f suffix
- move RMSNormBatched into ops.h
- static constexpr -> constexpr
- concrete type instead of LayerT, WeightArrayT
- inline GetWeights
- remove if (runtime_config.verbosity
- merge AllocatePrefill and AllocateDecode
- remove bf_ffw_hidden
PiperOrigin-RevId: 644931277
2024-06-20 01:10:24 -07:00
The gemma.cpp Authors
658fb3e506
Move test placeholder to a later pos.
...
PiperOrigin-RevId: 644808456
2024-06-19 13:24:10 -07:00
The gemma.cpp Authors
0e612d9a20
Split out common parts (embedder and transformer block) from Prefill() and Transformer() into separate functions.
...
PiperOrigin-RevId: 644455520
2024-06-18 11:24:56 -07:00
Paul Chang
d7d9d14f0e
Move kGriffinLayers into ConfigNoSSM, set kGemmaLayers directly
...
For regular (non-SSM) Gemma models, kGriffinLayers is by definition always zero
and kGemmaLayers is just the number of layers.
PiperOrigin-RevId: 644384531
2024-06-18 07:52:52 -07:00
Jan Wassenberg
70506b0a62
Fix debug_prompt and other binaries (internal init)
...
PiperOrigin-RevId: 644367683
2024-06-18 06:48:59 -07:00
Jan Wassenberg
15135f5b3d
Simplify Attention.
...
Shared kMHA, reuse from Activations,
inline Attn lambda, use QDim as the stride between successive Q.
PiperOrigin-RevId: 644343854
2024-06-18 05:08:12 -07:00
Jan Wassenberg
2ac47e4a06
Fix Py binding/run_example: use GemmaEnv
...
PiperOrigin-RevId: 644318962
2024-06-18 03:20:22 -07:00
Jan Wassenberg
a07f60c9a1
1.15x 7b sfp prefill speedup: Matmul in attention
...
2b bf16:
prefill 114.456 -> 115.222
decode 16.8847 -> 16.9987
7b sfp:
prefill 18.8575 -> 21.7325
decode 5.68428 -> 5.79791
PiperOrigin-RevId: 644283676
2024-06-18 01:00:51 -07:00
Jan Wassenberg
355f7b4f80
Update developer docs and mention asan/msan
...
PiperOrigin-RevId: 644000220
2024-06-17 07:29:12 -07:00
Jan Wassenberg
704d936764
Further simplification to ForEachTensor, thanks I.K.
...
PiperOrigin-RevId: 643996210
2024-06-17 07:12:26 -07:00
Jan Wassenberg
7d0720675f
Move raw_weights into separate header, used mainly by compress_weights.
...
Fix warnings in backprop/* (include)
PiperOrigin-RevId: 643983136
2024-06-17 06:17:02 -07:00
Jan Wassenberg
ad790d89d1
Fix DASSERT - TiledBatch requires at least 2 vectors.
...
Also use shorthand for weight types.
PiperOrigin-RevId: 643958371
2024-06-17 04:29:01 -07:00
The gemma.cpp Authors
7dbfa44794
Refactor CompressedWeights.
...
PiperOrigin-RevId: 643934198
2024-06-17 02:54:54 -07:00
Ray Smith
e0afdfa8fb
Added bias vector addition to MatMul
...
PiperOrigin-RevId: 643385381
2024-06-14 10:25:16 -07:00
The gemma.cpp Authors
2228055bb8
Internal change.
...
PiperOrigin-RevId: 643330703
2024-06-14 06:53:41 -07:00
Jan Wassenberg
29c0c574e6
Integrate matmul into FFW: 4.3x prefill speedup
...
```
before, bf16:
27.2929 prefill tokens / sec
17.2114 tokens / sec
after, bf16
116.496 prefill tokens / sec
17.5391 tokens / sec
```
PiperOrigin-RevId: 643328437
2024-06-14 06:32:26 -07:00
Ray Smith
198326a682
Removed now redundant non-batch matmul
...
PiperOrigin-RevId: 643317187
2024-06-14 05:13:36 -07:00
Andrey Vlasov
b17631c95f
Implement a missing (bf16, f32) tiled MatMul kernel.
...
PiperOrigin-RevId: 643313676
2024-06-14 04:54:40 -07:00
Jan Wassenberg
d3c6a45b59
Major duplicated code reduction in test/benchmarks
...
Helper functions to tokenize/wrap
Move LayersOutputFunc into RuntimeConfig
AcceptFunc passes the probability
Implement StringFromType using the parser, and verify results match
PiperOrigin-RevId: 643255119
2024-06-14 00:16:25 -07:00
Jan Wassenberg
c15ff9529c
Reduce duplication in Config* by inheriting no-SSM
...
PiperOrigin-RevId: 643030629
2024-06-13 09:48:56 -07:00
Ray Smith
ea525da967
Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time.
...
PiperOrigin-RevId: 643017973
2024-06-13 09:05:40 -07:00
The gemma.cpp Authors
1b40619864
Increase parallelism in ops_test
...
PiperOrigin-RevId: 643013415
2024-06-13 08:50:41 -07:00
Andrey Vlasov
bf78a065e1
Make gemma/ops_test `large`.
...
PiperOrigin-RevId: 642923146
2024-06-13 03:33:46 -07:00
Andrey Vlasov
38eb452b94
Support mixed (bf16, sfp) tiled MatMul. Same sfp-decompress strategy as in (f32,
...
sfp) tiled MatMul.
PiperOrigin-RevId: 642901844
2024-06-13 02:07:21 -07:00
Daniel Keysers
6e67a6d8a9
Tiny cleanup: distinguish between "ids" and "pieces" in argument names when encoding.
...
PiperOrigin-RevId: 642614278
2024-06-12 07:52:13 -07:00
Daniel Keysers
1ac9857014
Extends Transformer() to prepare for batched processing.
...
PiperOrigin-RevId: 642603025
2024-06-12 07:01:03 -07:00
The gemma.cpp Authors
2a0e6ee976
Fix numerical issue in Softcap by subtracting max.
...
Also update test threshold.
PiperOrigin-RevId: 642587468
2024-06-12 05:42:16 -07:00
Copybara-Service
e37447cfe2
Merge pull request #234 from szabadka:build-fix
...
PiperOrigin-RevId: 642551103
2024-06-12 02:29:21 -07:00
Zoltan Szabadka
d98523187c
Add benchmark dependency to cmake build.
2024-06-12 08:14:29 +00:00
The gemma.cpp Authors
f467670de7
Implement float * SfpStream matmul by decompressing 4 * kColsA_RowsB -sized chunks of the second matrix.
...
PiperOrigin-RevId: 642533996
2024-06-12 01:11:59 -07:00
Zoltan Szabadka
9c869c4655
Revert "Add benchmark dependency to cmake build"
...
This reverts commit 12ce91a163 .
Reason: accidentally pushed directly to dev branch, will redo with
a PR and copybara-import.
2024-06-12 07:56:03 +00:00
Zoltan Szabadka
12ce91a163
Add benchmark dependency to cmake build
2024-06-12 07:09:15 +00:00
Ray Smith
bdf33c7008
Updated benchmarks.cc to recent changes to Gemma API.
...
PiperOrigin-RevId: 642285902
2024-06-11 08:55:40 -07:00
Phil Culliton
b6565e3bf6
Update AssertClose for large matrices and add large matrix test
...
PiperOrigin-RevId: 642277221
2024-06-11 08:22:47 -07:00