Jan Wassenberg
3fe79b3876
Fix msan uninitialized scale
...
PiperOrigin-RevId: 653655471
2024-07-18 09:42:31 -07:00
Daniel Keysers
e87e65ca45
Add scale parameter to MatMul.
...
Add accessor to CompressedArray that asserts the scale is 1 and use it.
PiperOrigin-RevId: 653604840
2024-07-18 06:58:56 -07:00
Daniel Keysers
5a751a9a44
Update gemma-27b to the correct query scaling.
...
PiperOrigin-RevId: 653201646
2024-07-17 05:43:52 -07:00
Jan Wassenberg
992a2cbbc0
De-templatize Activations, add RowVectorBatch class
...
Also remove most kBatchSize args.
PiperOrigin-RevId: 653185525
2024-07-17 04:38:15 -07:00
Daniel Keysers
ff34370aac
Simplify FFW by using MatMul_4x4_Batch_Add.
...
Affects only the griffin model, where prefill TPS improves by about 70%.
PiperOrigin-RevId: 652878176
2024-07-16 09:41:23 -07:00
Paul Chang
48b900b1b9
Fix examples/hello_world for real.
...
PiperOrigin-RevId: 652509319
2024-07-15 09:38:52 -07:00
Jan Wassenberg
cd530374b3
Further 1.02x prefill speedup from batch 64->512
...
Measured on SKX. Larger speedup expected for Zen4/SPR.
PiperOrigin-RevId: 652472928
2024-07-15 07:26:10 -07:00
Paul Chang
aaee666a1d
Fix gemma_cpp/examples/hello_world build.
...
Include Bazel build rules, too.
PiperOrigin-RevId: 652469406
2024-07-15 07:11:01 -07:00
The gemma.cpp Authors
c879133a5a
Increase the prefill batch size to 64.
...
PiperOrigin-RevId: 651754772
2024-07-12 06:28:37 -07:00
The gemma.cpp Authors
df3fb70802
Improve readability with RepeatedAttentionWindowSizes
...
PiperOrigin-RevId: 651431738
2024-07-11 09:11:46 -07:00
Jan Wassenberg
edaf61b983
SVE build fix: avoid capturing vectors directly.
...
Also use more V typedef instead of auto.
PiperOrigin-RevId: 651423685
2024-07-11 08:43:56 -07:00
Jan Wassenberg
be765afce2
Simplify matmul: only 2 overloads
...
Also add StoreHorizontalSumsMaybeAdd wrapper function,
move MatMulSlowBatch into test.
1.02-1.06x speedup.
PiperOrigin-RevId: 651394791
2024-07-11 06:58:42 -07:00
Andrey Vlasov
3e92088595
Remove allocation from GEMM_4x4_Tile when decoding compressed weights by implementing
...
SfpCodec::Dec2F and ComressTraits<T>::Decompress2 for all supported types. It also allows to remove one of the specializations of GEMM_4x4_Tile, handling compressed MatB with one function. As before even when MatA is bf16 it is using 32-bit registers for computations.
Measurements for a 2b-it sfp-encoded model on a AMD Ryzen Threadripper PRO 3945WX 12-Cores:
baseline:
```
32.6254 prefill tokens / sec
8.91429 tokens / sec
115 milliseconds time to first token
```
this change:
```
54.3045 prefill tokens / sec
16.8191 tokens / sec
56 milliseconds time to first token
```
PiperOrigin-RevId: 651369694
2024-07-11 05:13:39 -07:00
Kan Wu
f519ab6693
Refactor configurables.
...
PiperOrigin-RevId: 651259154
2024-07-10 21:30:58 -07:00
Andrey Vlasov
960ff4b4ec
Record time measurements in MatMul tests.
...
PiperOrigin-RevId: 651060711
2024-07-10 10:04:40 -07:00
Jan Wassenberg
ee6e017a77
Fix windows build: min conflict, unused VF
...
PiperOrigin-RevId: 650955138
2024-07-10 04:18:25 -07:00
Daniel Keysers
063bbaa683
Add more comments to attention computation (and some small restructuring).
...
PiperOrigin-RevId: 650929097
2024-07-10 02:39:07 -07:00
Daniel Keysers
cf76f0a401
Update gemma_test to also pass for the v1.1. models.
...
Make it an error if the model cannot be loaded.
PiperOrigin-RevId: 650232602
2024-07-08 06:45:37 -07:00
Jan Wassenberg
6a3f7cf3ea
Lint fix - string append, remove stale TODO
...
PiperOrigin-RevId: 650197468
2024-07-08 04:11:21 -07:00
Jan Wassenberg
cbb67b4ee0
Move benchmark_helper to evals/, weights_raw to compression/.
...
PiperOrigin-RevId: 650155983
2024-07-08 01:13:23 -07:00
Daniel Keysers
cdebcc3533
Update gemma_test with the expected entropy values for the IT models of size 2B/7B/9B/27B.
...
PiperOrigin-RevId: 649662047
2024-07-05 08:58:51 -07:00
Jan Wassenberg
438b1bace2
Fix handling of %c and %q if eot_string. Fixes #283 , thanks @ljcucc
...
PiperOrigin-RevId: 649651535
2024-07-05 07:54:00 -07:00
Jan Wassenberg
f823371691
Cleanup: move util/compress and convert_weights to compression/
...
Also remove unused models/, lint convert_weights
PiperOrigin-RevId: 649613088
2024-07-05 04:16:52 -07:00
Jan Wassenberg
41efec4dba
Add Py bindings for weight compression
...
TODO: this uses clif instead of pybind11, and depends on absl.
PiperOrigin-RevId: 649575815
2024-07-05 01:06:00 -07:00
Jan Wassenberg
118e802b00
Fix gemma_test - moved to evals/.
...
PiperOrigin-RevId: 649338633
2024-07-04 02:04:05 -07:00
Jan Wassenberg
c7c3daa624
7x compile time speedup: shard gemma.cc
...
Use overloaded functions defined in gemma/instantiations.
Also split out activations.h.
PiperOrigin-RevId: 649053122
2024-07-03 06:35:04 -07:00
Daniel Keysers
a40165dea2
Small cleanups. Fixes gemma_test build.
...
PiperOrigin-RevId: 649008524
2024-07-03 03:13:38 -07:00
Kan Wu
7e4b20455e
Add sliding window attention for Gemma 2.
...
PiperOrigin-RevId: 648778253
2024-07-02 11:08:03 -07:00
Jan Wassenberg
09a7e75ead
Prep for sharding gemma.cc: split into kv_cache, tokenizer.
...
Move activations.h to backprop/ to make space for another activations.h.
PiperOrigin-RevId: 648744500
2024-07-02 09:31:06 -07:00
Jan Wassenberg
85fcd3cd80
Cleanup: add ModelInfo struct, remove gcpp::
...
PiperOrigin-RevId: 648707763
2024-07-02 07:11:15 -07:00
Jan Wassenberg
b1c1ec1d59
Use benchmark_helper in py bindings (adds BOS)
...
Also remove thread clamp (OK to be zero or large).
PiperOrigin-RevId: 648657155
2024-07-02 03:27:15 -07:00
Jan Wassenberg
e527e7662e
Remove unused kSystemPrompt
...
PiperOrigin-RevId: 648429567
2024-07-01 11:18:07 -07:00
Jan Wassenberg
af8eb2fde3
Declutter gemma/ directory, move binaries to evals/ and util/.
...
PiperOrigin-RevId: 648400795
2024-07-01 09:51:04 -07:00
Jan Wassenberg
e588a7f45d
Add config for att/final cap, skip max-subtract. Fixes #278
...
Also update includes/deps for backprop/.
PiperOrigin-RevId: 648399222
2024-07-01 09:45:26 -07:00
The gemma.cpp Authors
da7507e6f0
Add prompt batching to Gemma.cpp.
...
This CL adds a new function to Gemma that allows for batching of multiple prompts. The function takes a vector of prompts and returns a vector of responses. The prompts are processed in parallel, and the responses are returned in the same order as the prompts.
PiperOrigin-RevId: 648367559
2024-07-01 07:51:31 -07:00
Paul Chang
8ac5d66575
Introduce new Gemma 9B and 27B configs
...
PiperOrigin-RevId: 647299080
2024-06-27 06:45:24 -07:00
Paul Chang
78e96fdc70
Refactor model type / training tables, simplify reverse mapping
...
PiperOrigin-RevId: 647069372
2024-06-26 13:59:14 -07:00
Paul Chang
aa57fc3952
Remove unused BUILD dependency
...
PiperOrigin-RevId: 646519547
2024-06-25 10:12:13 -07:00
The gemma.cpp Authors
7fc8ddf825
Fix a clang tidy warning
...
PiperOrigin-RevId: 646498062
2024-06-25 09:02:59 -07:00
The gemma.cpp Authors
ef786f1bfc
Use hwy::ThreadPool::MaxThreads() to determine the number of threads to use.
...
PiperOrigin-RevId: 646117298
2024-06-24 09:16:04 -07:00
The gemma.cpp Authors
12089417b5
Improve logging when running Gemma examples: fix the issue when max_tokens, max_generated_tokens and temperature were logging without any trailing space/newline.
...
PiperOrigin-RevId: 646014268
2024-06-24 02:00:34 -07:00
The gemma.cpp Authors
80b1347393
Skip the last RMSNormInplaceBatched in the Prefill phase.
...
That only modifies activations.x, but it is called with prefill_activations which are not used after the Prefill call.
PiperOrigin-RevId: 645391387
2024-06-21 08:04:22 -07:00
Copybara-Service
82f16087ba
Merge pull request #266 from ufownl:bugfix/kvcache
...
PiperOrigin-RevId: 645329504
2024-06-21 03:06:52 -07:00
Copybara-Service
c2efcb0da4
Merge pull request #267 from ufownl:bugfix/clang_ce
...
PiperOrigin-RevId: 645329422
2024-06-21 03:06:04 -07:00
RangerUFO
f7855251ea
Fix compilation errors in clang
...
It will occur in `ubuntu-latest` of GitHub Actions.
2024-06-21 13:40:40 +08:00
RangerUFO
d7787c8f6c
Fix KV cache size calculation error
2024-06-21 13:06:26 +08:00
Daniel Keysers
0570972d43
Fixing two typos.
...
PiperOrigin-RevId: 645103198
2024-06-20 11:33:12 -07:00
The gemma.cpp Authors
a85725614a
Refactor kCachePosSize and kCacheLayerSize into separate functors.
...
PiperOrigin-RevId: 645048519
2024-06-20 08:52:08 -07:00
Jan Wassenberg
48ebba8b7a
Code cleanup
...
- Simplify template arg list, enable deduction
- missing hn:: on " Lanes"
- 1.0f suffix
- move RMSNormBatched into ops.h
- static constexpr -> constexpr
- concrete type instead of LayerT, WeightArrayT
- inline GetWeights
- remove if (runtime_config.verbosity
- merge AllocatePrefill and AllocateDecode
- remove bf_ffw_hidden
PiperOrigin-RevId: 644931277
2024-06-20 01:10:24 -07:00
The gemma.cpp Authors
658fb3e506
Move test placeholder to a later pos.
...
PiperOrigin-RevId: 644808456
2024-06-19 13:24:10 -07:00