Jan Wassenberg
8e028632f7
0.98x prefill: refactor in prep for cache blocking.
...
Slower because we now init tiles of C and accumulate into them.
Also remove unused var in optimize_test and use BF16 typedef.
PiperOrigin-RevId: 662115916
2024-08-12 09:26:29 -07:00
Jan Wassenberg
1617e1a33d
SFP speedup: 1.14x f32, 1.19x bf16 dot = 1.02x prefill
...
12->9 ops by recognizing the upper/lower bytes are simply shifted.
PiperOrigin-RevId: 659609241
2024-08-05 10:59:13 -07:00
Jan Wassenberg
6ea4232b2e
MatMul cleanup: Mat struct, simplify args.
...
Add large benchmark to test, use 4 threads, skip some targets.
Also use Traits::Name instead of typeid.
PiperOrigin-RevId: 657496185
2024-07-30 01:55:50 -07:00
Thomas Fischbacher
d9f86f8e4d
Add Python code for converting Griffin Orbax weights. Refs #301
...
PiperOrigin-RevId: 657296255
2024-07-29 12:53:30 -07:00
The gemma.cpp Authors
c1f243c351
Fix setting scales in Py binding
...
PiperOrigin-RevId: 655284183
2024-07-23 13:32:50 -07:00
Daniel Keysers
e87e65ca45
Add scale parameter to MatMul.
...
Add accessor to CompressedArray that asserts the scale is 1 and use it.
PiperOrigin-RevId: 653604840
2024-07-18 06:58:56 -07:00
Daniel Keysers
ff34370aac
Simplify FFW by using MatMul_4x4_Batch_Add.
...
Affects only the griffin model, where prefill TPS improves by about 70%.
PiperOrigin-RevId: 652878176
2024-07-16 09:41:23 -07:00
Andrey Vlasov
3e92088595
Remove allocation from GEMM_4x4_Tile when decoding compressed weights by implementing
...
SfpCodec::Dec2F and ComressTraits<T>::Decompress2 for all supported types. It also allows to remove one of the specializations of GEMM_4x4_Tile, handling compressed MatB with one function. As before even when MatA is bf16 it is using 32-bit registers for computations.
Measurements for a 2b-it sfp-encoded model on a AMD Ryzen Threadripper PRO 3945WX 12-Cores:
baseline:
```
32.6254 prefill tokens / sec
8.91429 tokens / sec
115 milliseconds time to first token
```
this change:
```
54.3045 prefill tokens / sec
16.8191 tokens / sec
56 milliseconds time to first token
```
PiperOrigin-RevId: 651369694
2024-07-11 05:13:39 -07:00
Kan Wu
f519ab6693
Refactor configurables.
...
PiperOrigin-RevId: 651259154
2024-07-10 21:30:58 -07:00
Jan Wassenberg
cbb67b4ee0
Move benchmark_helper to evals/, weights_raw to compression/.
...
PiperOrigin-RevId: 650155983
2024-07-08 01:13:23 -07:00
Jan Wassenberg
f823371691
Cleanup: move util/compress and convert_weights to compression/
...
Also remove unused models/, lint convert_weights
PiperOrigin-RevId: 649613088
2024-07-05 04:16:52 -07:00
Jan Wassenberg
41efec4dba
Add Py bindings for weight compression
...
TODO: this uses clif instead of pybind11, and depends on absl.
PiperOrigin-RevId: 649575815
2024-07-05 01:06:00 -07:00
Jan Wassenberg
d3c6a45b59
Major duplicated code reduction in test/benchmarks
...
Helper functions to tokenize/wrap
Move LayersOutputFunc into RuntimeConfig
AcceptFunc passes the probability
Implement StringFromType using the parser, and verify results match
PiperOrigin-RevId: 643255119
2024-06-14 00:16:25 -07:00
Jan Wassenberg
a0e808e341
Add compression/ comments, especially on SFP range
...
PiperOrigin-RevId: 642238720
2024-06-11 05:47:49 -07:00
Jan Wassenberg
5c3e5f7038
Remove no longer required stats.h - use Highway version instead
...
PiperOrigin-RevId: 640440379
2024-06-05 01:37:48 -07:00
Paul Chang
175e389c3c
revert back to HWY_ASSERT for lane constraints, qualify hn::Add
...
PiperOrigin-RevId: 640193239
2024-06-04 10:10:18 -07:00
Jan Wassenberg
4f9155d8c6
Add bf16 matmul support, update naming+test
...
Avoid int32, which can easily overflow for large matrices.
Also fix IDE warning in sfp-inl.
PiperOrigin-RevId: 640149845
2024-06-04 07:41:46 -07:00
Zoltan Szabadka
36e4d8bbfe
Add first version of backpropagation support.
...
This is still in progress / experimental, currently it is only
implemented for normal gemma MQA attention layers, and no
parallelism is added yet for backward pass.
Since we need to remember all activations from all layers, the
forward pass was also reimplemented with a new activation data
structure.
2024-06-04 08:37:49 +00:00
Jan Wassenberg
a44cbdadc2
Update to Highway 1.2 for topology/VQSelect
...
Also fix unused-warning in compress-inl.
PiperOrigin-RevId: 639116915
2024-05-31 12:29:10 -07:00
Paul Chang
c0643577c3
Minor internal refactoring.
...
PiperOrigin-RevId: 635852078
2024-05-21 10:29:59 -07:00
Paul Chang
cfce314715
Make BlobWriter::Add() accept const void*
...
PiperOrigin-RevId: 634780483
2024-05-17 08:11:06 -07:00
Jan Wassenberg
22fe9809ac
Fix SVE build: add missing hn::
...
PiperOrigin-RevId: 632481097
2024-05-10 06:49:26 -07:00
Jan Wassenberg
c5c9fc300c
Enable even/odd for SFP. Refs #166
...
Disable it for float32 because there is not enough benefit.
PiperOrigin-RevId: 631788326
2024-05-08 07:09:06 -07:00
Jan Wassenberg
f6d02b2870
Fix RecurrentGemma (refs #166 ) - one Dot was ignoring scale.
...
Remove extra Dot() overload
MatVecAdd always adds, use MatVecT<kAdd> if conditional.
Remove ununsed MatVecAddLoop and MatVecLoop
No longer tsan-verify even_odd
PiperOrigin-RevId: 631377279
2024-05-07 04:40:42 -07:00
Jan Wassenberg
b5a9ade75f
2x speedup of SFP decode (1.4x overall) on AVX3_DL+.
...
Thanks @nzmichaelh for suggesting table lookups!
PiperOrigin-RevId: 631337524
2024-05-07 01:46:43 -07:00
Zoltan Szabadka
429eb78512
Remove unused vars.
2024-05-03 13:37:17 +00:00
Sam Kaufman
f608337fef
Remove Bf16ToF32EO and use PromoteEvenTo and PromoteOddTo.
2024-04-29 14:13:07 -07:00
Sam Kaufman
5cb63346aa
supports_eo -> kSupportsEvenOdd
2024-04-29 12:51:35 -07:00
Sam Kaufman
0816a1070d
Even-odd layout MatVecs for bf16 weights.
2024-04-28 20:09:25 -07:00
Paul Chang
e8f59bb411
Fix underflow in NUQ ClusterCost()
...
PiperOrigin-RevId: 628137904
2024-04-25 11:28:51 -07:00
Jan Wassenberg
e9a0caed87
Further improve IO, enable multiple backends without -D.
...
Move Path into io.h and use for opening files.
Removes dependency of gemma_lib on args.
Separate Windows codepath instead of emulating POSIX functions.
Plus lint fixes.
PiperOrigin-RevId: 626279004
2024-04-19 00:40:29 -07:00
Jan Wassenberg
a8ceb75f43
Improved IO abstraction layer
...
Move to unique_ptr-like File class.
Move `if OS_WIN` into wrapper functions.
exists -> Exists.
PiperOrigin-RevId: 625923056
2024-04-17 23:15:07 -07:00
Jan Wassenberg
a939b5fc9f
Update distortion.h to weighted average, add distortion_test.
...
More thorough checks in sfp_test and nuq_test.
nuq_test: use deterministic input generator.
PiperOrigin-RevId: 625602019
2024-04-17 01:44:19 -07:00
Jan Wassenberg
a982ec1287
Move code to gemma/ so we can remove error-prone copybara: comments.
...
Also fix includes and Lint warnings.
PiperOrigin-RevId: 623127487
2024-04-09 04:45:42 -07:00
Luca Versari
4c23932289
Improve weight handling.
...
- Allow scaling of SFP weights
- Allow using uncompressed weights
- Do not try to compress weights in the main model calls
- Reduce code duplication in weight handling with some macros
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Thomas Fischbacher <tfish@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-06 11:08:47 +02:00
Jan Wassenberg
7122afed5a
Add note on weight update and improve error message
...
PiperOrigin-RevId: 621849989
2024-04-04 07:17:27 -07:00
Jan Wassenberg
61e031fe98
Towards building tests without GUnit Refs #29
...
PiperOrigin-RevId: 618032987
2024-03-21 19:28:02 -07:00
Jan Wassenberg
24add61dd9
Fix SFP/NUQ for bf16 rounding in Highway
...
SFP: Avoid rounding twice, and more robust TestDot.
NUQ: also more robust SNR, minor touchups to header.
PiperOrigin-RevId: 618030096
2024-03-21 19:06:19 -07:00
Jan Wassenberg
ba86c8d590
Remove obsolete copybara tags, faster bazel builds (debug)
...
PiperOrigin-RevId: 617576799
2024-03-21 04:19:02 +01:00
Eric Ye
89be4c3de8
No public description
...
PiperOrigin-RevId: 617315030
2024-03-21 04:18:36 +01:00
Jan Wassenberg
30b8a3c1ac
Fix build for RPi, missing hn::. Refs #112 , thanks long568
...
PiperOrigin-RevId: 617704418
2024-03-20 20:07:49 -07:00
Jan Wassenberg
06cea2bcdb
Remove obsolete copybara tags, faster bazel builds (debug)
...
PiperOrigin-RevId: 617576799
2024-03-20 23:37:39 +01:00
Eric Ye
ffd02c59ad
No public description
...
PiperOrigin-RevId: 617315030
2024-03-20 23:37:12 +01:00
Jan Wassenberg
7d5364bb80
Remove obsolete copybara tags, faster bazel builds (debug)
...
PiperOrigin-RevId: 617576799
2024-03-20 11:31:59 -07:00
Jan Wassenberg
fce5c8c967
Avoid fadvise on older Android. Fixes #84
...
PiperOrigin-RevId: 613815953
2024-03-07 22:19:22 -08:00
Jan Wassenberg
bb9b023502
Support Bazel builds. Fixes #16
...
Also fix nuq/sfp-inl: warning, cast, and disable SCALAR
PiperOrigin-RevId: 612704056
2024-03-04 22:07:25 -08:00
Copybara-Service
cd7468199c
Merge pull request #65 from enum-class:narrowing-issues
...
PiperOrigin-RevId: 612279564
2024-03-03 18:51:59 -08:00
Jan Wassenberg
b6aaf6bbb8
Fix for Android's 32-bit off_t. Fixes #62
...
PiperOrigin-RevId: 611249534
2024-02-28 15:30:19 -08:00
Jan Wassenberg
272f17ddb3
Warning fixes: unused member, cast, unused function
...
PiperOrigin-RevId: 611074887
2024-02-28 05:54:22 -08:00
enum-class
06dd013397
Add clang-tidy, fix narrowing issues, fix constness
2024-02-28 20:04:09 +08:00