Kan Wu
f519ab6693
Refactor configurables.
...
PiperOrigin-RevId: 651259154
2024-07-10 21:30:58 -07:00
Andrey Vlasov
960ff4b4ec
Record time measurements in MatMul tests.
...
PiperOrigin-RevId: 651060711
2024-07-10 10:04:40 -07:00
Jan Wassenberg
ee6e017a77
Fix windows build: min conflict, unused VF
...
PiperOrigin-RevId: 650955138
2024-07-10 04:18:25 -07:00
Daniel Keysers
063bbaa683
Add more comments to attention computation (and some small restructuring).
...
PiperOrigin-RevId: 650929097
2024-07-10 02:39:07 -07:00
Daniel Keysers
cf76f0a401
Update gemma_test to also pass for the v1.1. models.
...
Make it an error if the model cannot be loaded.
PiperOrigin-RevId: 650232602
2024-07-08 06:45:37 -07:00
Jan Wassenberg
6a3f7cf3ea
Lint fix - string append, remove stale TODO
...
PiperOrigin-RevId: 650197468
2024-07-08 04:11:21 -07:00
Jan Wassenberg
cbb67b4ee0
Move benchmark_helper to evals/, weights_raw to compression/.
...
PiperOrigin-RevId: 650155983
2024-07-08 01:13:23 -07:00
Daniel Keysers
cdebcc3533
Update gemma_test with the expected entropy values for the IT models of size 2B/7B/9B/27B.
...
PiperOrigin-RevId: 649662047
2024-07-05 08:58:51 -07:00
Jan Wassenberg
438b1bace2
Fix handling of %c and %q if eot_string. Fixes #283 , thanks @ljcucc
...
PiperOrigin-RevId: 649651535
2024-07-05 07:54:00 -07:00
Jan Wassenberg
f823371691
Cleanup: move util/compress and convert_weights to compression/
...
Also remove unused models/, lint convert_weights
PiperOrigin-RevId: 649613088
2024-07-05 04:16:52 -07:00
Jan Wassenberg
41efec4dba
Add Py bindings for weight compression
...
TODO: this uses clif instead of pybind11, and depends on absl.
PiperOrigin-RevId: 649575815
2024-07-05 01:06:00 -07:00
Jan Wassenberg
118e802b00
Fix gemma_test - moved to evals/.
...
PiperOrigin-RevId: 649338633
2024-07-04 02:04:05 -07:00
Jan Wassenberg
c7c3daa624
7x compile time speedup: shard gemma.cc
...
Use overloaded functions defined in gemma/instantiations.
Also split out activations.h.
PiperOrigin-RevId: 649053122
2024-07-03 06:35:04 -07:00
Daniel Keysers
a40165dea2
Small cleanups. Fixes gemma_test build.
...
PiperOrigin-RevId: 649008524
2024-07-03 03:13:38 -07:00
Kan Wu
7e4b20455e
Add sliding window attention for Gemma 2.
...
PiperOrigin-RevId: 648778253
2024-07-02 11:08:03 -07:00
Jan Wassenberg
09a7e75ead
Prep for sharding gemma.cc: split into kv_cache, tokenizer.
...
Move activations.h to backprop/ to make space for another activations.h.
PiperOrigin-RevId: 648744500
2024-07-02 09:31:06 -07:00
Jan Wassenberg
85fcd3cd80
Cleanup: add ModelInfo struct, remove gcpp::
...
PiperOrigin-RevId: 648707763
2024-07-02 07:11:15 -07:00
Jan Wassenberg
b1c1ec1d59
Use benchmark_helper in py bindings (adds BOS)
...
Also remove thread clamp (OK to be zero or large).
PiperOrigin-RevId: 648657155
2024-07-02 03:27:15 -07:00
Jan Wassenberg
e527e7662e
Remove unused kSystemPrompt
...
PiperOrigin-RevId: 648429567
2024-07-01 11:18:07 -07:00
Jan Wassenberg
af8eb2fde3
Declutter gemma/ directory, move binaries to evals/ and util/.
...
PiperOrigin-RevId: 648400795
2024-07-01 09:51:04 -07:00
Jan Wassenberg
e588a7f45d
Add config for att/final cap, skip max-subtract. Fixes #278
...
Also update includes/deps for backprop/.
PiperOrigin-RevId: 648399222
2024-07-01 09:45:26 -07:00
The gemma.cpp Authors
da7507e6f0
Add prompt batching to Gemma.cpp.
...
This CL adds a new function to Gemma that allows for batching of multiple prompts. The function takes a vector of prompts and returns a vector of responses. The prompts are processed in parallel, and the responses are returned in the same order as the prompts.
PiperOrigin-RevId: 648367559
2024-07-01 07:51:31 -07:00
Paul Chang
8ac5d66575
Introduce new Gemma 9B and 27B configs
...
PiperOrigin-RevId: 647299080
2024-06-27 06:45:24 -07:00
Paul Chang
78e96fdc70
Refactor model type / training tables, simplify reverse mapping
...
PiperOrigin-RevId: 647069372
2024-06-26 13:59:14 -07:00
Paul Chang
aa57fc3952
Remove unused BUILD dependency
...
PiperOrigin-RevId: 646519547
2024-06-25 10:12:13 -07:00
The gemma.cpp Authors
7fc8ddf825
Fix a clang tidy warning
...
PiperOrigin-RevId: 646498062
2024-06-25 09:02:59 -07:00
The gemma.cpp Authors
ef786f1bfc
Use hwy::ThreadPool::MaxThreads() to determine the number of threads to use.
...
PiperOrigin-RevId: 646117298
2024-06-24 09:16:04 -07:00
The gemma.cpp Authors
12089417b5
Improve logging when running Gemma examples: fix the issue when max_tokens, max_generated_tokens and temperature were logging without any trailing space/newline.
...
PiperOrigin-RevId: 646014268
2024-06-24 02:00:34 -07:00
The gemma.cpp Authors
80b1347393
Skip the last RMSNormInplaceBatched in the Prefill phase.
...
That only modifies activations.x, but it is called with prefill_activations which are not used after the Prefill call.
PiperOrigin-RevId: 645391387
2024-06-21 08:04:22 -07:00
Copybara-Service
82f16087ba
Merge pull request #266 from ufownl:bugfix/kvcache
...
PiperOrigin-RevId: 645329504
2024-06-21 03:06:52 -07:00
Copybara-Service
c2efcb0da4
Merge pull request #267 from ufownl:bugfix/clang_ce
...
PiperOrigin-RevId: 645329422
2024-06-21 03:06:04 -07:00
RangerUFO
f7855251ea
Fix compilation errors in clang
...
It will occur in `ubuntu-latest` of GitHub Actions.
2024-06-21 13:40:40 +08:00
RangerUFO
d7787c8f6c
Fix KV cache size calculation error
2024-06-21 13:06:26 +08:00
Daniel Keysers
0570972d43
Fixing two typos.
...
PiperOrigin-RevId: 645103198
2024-06-20 11:33:12 -07:00
The gemma.cpp Authors
a85725614a
Refactor kCachePosSize and kCacheLayerSize into separate functors.
...
PiperOrigin-RevId: 645048519
2024-06-20 08:52:08 -07:00
Jan Wassenberg
48ebba8b7a
Code cleanup
...
- Simplify template arg list, enable deduction
- missing hn:: on " Lanes"
- 1.0f suffix
- move RMSNormBatched into ops.h
- static constexpr -> constexpr
- concrete type instead of LayerT, WeightArrayT
- inline GetWeights
- remove if (runtime_config.verbosity
- merge AllocatePrefill and AllocateDecode
- remove bf_ffw_hidden
PiperOrigin-RevId: 644931277
2024-06-20 01:10:24 -07:00
The gemma.cpp Authors
658fb3e506
Move test placeholder to a later pos.
...
PiperOrigin-RevId: 644808456
2024-06-19 13:24:10 -07:00
The gemma.cpp Authors
0e612d9a20
Split out common parts (embedder and transformer block) from Prefill() and Transformer() into separate functions.
...
PiperOrigin-RevId: 644455520
2024-06-18 11:24:56 -07:00
Paul Chang
d7d9d14f0e
Move kGriffinLayers into ConfigNoSSM, set kGemmaLayers directly
...
For regular (non-SSM) Gemma models, kGriffinLayers is by definition always zero
and kGemmaLayers is just the number of layers.
PiperOrigin-RevId: 644384531
2024-06-18 07:52:52 -07:00
Jan Wassenberg
70506b0a62
Fix debug_prompt and other binaries (internal init)
...
PiperOrigin-RevId: 644367683
2024-06-18 06:48:59 -07:00
Jan Wassenberg
15135f5b3d
Simplify Attention.
...
Shared kMHA, reuse from Activations,
inline Attn lambda, use QDim as the stride between successive Q.
PiperOrigin-RevId: 644343854
2024-06-18 05:08:12 -07:00
Jan Wassenberg
2ac47e4a06
Fix Py binding/run_example: use GemmaEnv
...
PiperOrigin-RevId: 644318962
2024-06-18 03:20:22 -07:00
Jan Wassenberg
a07f60c9a1
1.15x 7b sfp prefill speedup: Matmul in attention
...
2b bf16:
prefill 114.456 -> 115.222
decode 16.8847 -> 16.9987
7b sfp:
prefill 18.8575 -> 21.7325
decode 5.68428 -> 5.79791
PiperOrigin-RevId: 644283676
2024-06-18 01:00:51 -07:00
Jan Wassenberg
355f7b4f80
Update developer docs and mention asan/msan
...
PiperOrigin-RevId: 644000220
2024-06-17 07:29:12 -07:00
Jan Wassenberg
704d936764
Further simplification to ForEachTensor, thanks I.K.
...
PiperOrigin-RevId: 643996210
2024-06-17 07:12:26 -07:00
Jan Wassenberg
7d0720675f
Move raw_weights into separate header, used mainly by compress_weights.
...
Fix warnings in backprop/* (include)
PiperOrigin-RevId: 643983136
2024-06-17 06:17:02 -07:00
Jan Wassenberg
ad790d89d1
Fix DASSERT - TiledBatch requires at least 2 vectors.
...
Also use shorthand for weight types.
PiperOrigin-RevId: 643958371
2024-06-17 04:29:01 -07:00
The gemma.cpp Authors
7dbfa44794
Refactor CompressedWeights.
...
PiperOrigin-RevId: 643934198
2024-06-17 02:54:54 -07:00
Ray Smith
e0afdfa8fb
Added bias vector addition to MatMul
...
PiperOrigin-RevId: 643385381
2024-06-14 10:25:16 -07:00
The gemma.cpp Authors
2228055bb8
Internal change.
...
PiperOrigin-RevId: 643330703
2024-06-14 06:53:41 -07:00