Copybara-Service
374fd7478a
Merge pull request #170 from szabadka:gemma2
...
PiperOrigin-RevId: 629408279
2024-04-30 07:40:30 -07:00
Zoltan Szabadka
afaca4efa8
Use more parallelism in the QKV projections in MQA mode.
...
Instead of MatVecLoop, we use MatVec and we combine k and v
into one 2 * kQKVDim long vector so that K and V projections
can be combined into one MatVec operation.
Benchmark results (summarization with 1600 tokens for prefill
and essay writing with 500 tokens for generation):
```
Prefill speed Generation speed
Num threads BEFORE AFTER BEFORE AFTER
4 9.81 t/s 9.96 t/s 8.39 t/s 8.46 t/s
18 31.50 t/s 36.67 t/s 23.10 t/s 25.83 t/s
32 45.36 t/s 58.91 t/s 27.60 t/s 31.25 t/s
64 57.72 t/s 80.64 t/s 35.40 t/s 39.76 t/s
```
2024-04-30 13:10:14 +00:00
Copybara-Service
befe9fb07e
Merge pull request #167 from szabadka:gemma2
...
PiperOrigin-RevId: 629325219
2024-04-30 01:00:37 -07:00
Sam Kaufman
6a78a23f4c
Abstracted some MatVecAdd spec. dupes.
2024-04-29 16:23:38 -07:00
Sam Kaufman
f608337fef
Remove Bf16ToF32EO and use PromoteEvenTo and PromoteOddTo.
2024-04-29 14:13:07 -07:00
Sam Kaufman
aa0b113214
(VecT*) to static_cast<VecT*>.
2024-04-29 12:53:47 -07:00
Sam Kaufman
5cb63346aa
supports_eo -> kSupportsEvenOdd
2024-04-29 12:51:35 -07:00
Zoltan Szabadka
27117cc39f
Simplify threading: remove the use of inner_pool.
...
We only used inner_pool in the prefill FFW function, and there we
can achieve sufficient parallelism on the rows of the matrix-vector
multiplications.
Benchmark results on a 1600-token summarization task:
```
Prefill speed
Num threads BEFORE AFTER
4 9.24 t/s 9.76 t/s
18 31.41 t/s 31.16 t/s
32 31.41 t/s 45.13 t/s
64 31.03 t/s 57.85 t/s
```
2024-04-29 16:07:30 +00:00
Paul Chang
1d18c5a129
Improve documentation for compress_weights flags
...
PiperOrigin-RevId: 629053191
2024-04-29 06:49:50 -07:00
Sam Kaufman
0816a1070d
Even-odd layout MatVecs for bf16 weights.
2024-04-28 20:09:25 -07:00
Jan Wassenberg
7a12e29027
Add error-checking for py binding, add missing include+hwasan check
...
PiperOrigin-RevId: 628453112
2024-04-26 10:59:41 -07:00
Paul Chang
e8f59bb411
Fix underflow in NUQ ClusterCost()
...
PiperOrigin-RevId: 628137904
2024-04-25 11:28:51 -07:00
Phil Culliton
9e0ac5de34
Update Clif wrapper to work with latest gemma.cpp and add simple example
...
PiperOrigin-RevId: 628134201
2024-04-25 11:17:16 -07:00
Paul Chang
2d4de6b08b
Support absolute positional embeddings from vanilla transformer
...
PiperOrigin-RevId: 628100831
2024-04-25 09:32:14 -07:00
Paul Chang
75eca87039
Simplify prefill early-exit (originally Merge #156 )
...
PiperOrigin-RevId: 627788524
2024-04-24 11:11:42 -07:00
Copybara-Service
b27d8d6b92
Merge pull request #156 from zeerd:dev
...
PiperOrigin-RevId: 627706909
2024-04-24 06:19:14 -07:00
Charles Chan
ea45d7c4d7
Use lambda to split function and Make stream_token can break prefill, too
2024-04-23 22:55:01 +08:00
Paul Chang
e8d29792ac
New token validity assertions, improve prompt truncation warning
...
PiperOrigin-RevId: 627376194
2024-04-23 07:05:59 -07:00
Jan Wassenberg
3bf22abb22
Fix sign comparison warnings
...
PiperOrigin-RevId: 627299902
2024-04-23 01:16:51 -07:00
Jan Wassenberg
ca971ef50f
Document weight conversion
...
PiperOrigin-RevId: 626957718
2024-04-22 01:58:30 -07:00
Jan Wassenberg
e9a0caed87
Further improve IO, enable multiple backends without -D.
...
Move Path into io.h and use for opening files.
Removes dependency of gemma_lib on args.
Separate Windows codepath instead of emulating POSIX functions.
Plus lint fixes.
PiperOrigin-RevId: 626279004
2024-04-19 00:40:29 -07:00
Paul Chang
38f1ea9b80
Eliminate redundant copies of TokenString()
...
Move this function outside of HWY_NAMESPACE since it doesn't need to be
optimized for any particular architecture.
PiperOrigin-RevId: 626098641
2024-04-18 11:31:50 -07:00
Jan Wassenberg
a8ceb75f43
Improved IO abstraction layer
...
Move to unique_ptr-like File class.
Move `if OS_WIN` into wrapper functions.
exists -> Exists.
PiperOrigin-RevId: 625923056
2024-04-17 23:15:07 -07:00
Jan Wassenberg
a939b5fc9f
Update distortion.h to weighted average, add distortion_test.
...
More thorough checks in sfp_test and nuq_test.
nuq_test: use deterministic input generator.
PiperOrigin-RevId: 625602019
2024-04-17 01:44:19 -07:00
Copybara-Service
05e7e2b2bb
Merge pull request #145 from atorero:dev
...
PiperOrigin-RevId: 624221085
2024-04-12 10:27:18 -07:00
Andrey Mikhaylov
4ef3da733a
Fixed minor things and added comments.
2024-04-12 15:39:16 +00:00
Andrey Mikhaylov
2c5706f159
Add comments regarding layers output usage.
2024-04-12 15:39:16 +00:00
Andrey Mikhaylov
03284d752e
Added layers output functionality to gemma and a binary debug_output to save the outputs to a json file.
2024-04-12 15:39:16 +00:00
Copybara-Service
342e998cb6
Merge pull request #142 from ufownl:refactor/data_structures
...
PiperOrigin-RevId: 623503486
2024-04-10 08:35:18 -07:00
RangerUFO
e541707caa
Rename the fields of Griffin weights
2024-04-10 21:04:31 +08:00
RangerUFO
4e960d67f6
Fix typos
2024-04-10 20:38:18 +08:00
RangerUFO
809bd0709d
Refactor data structures to reduce memory usage
2024-04-10 19:35:23 +08:00
Jan Wassenberg
54120a5571
Mention Makefile contributed by @jart
...
PiperOrigin-RevId: 623436818
2024-04-10 03:21:10 -07:00
Jan Wassenberg
881eeffe0a
Lint fixes: strcat, includes, arg naming
...
PiperOrigin-RevId: 623435210
2024-04-10 03:12:41 -07:00
Copybara-Service
da91f4c4be
Merge pull request #137 from zond:main
...
PiperOrigin-RevId: 623255639
2024-04-09 12:57:57 -07:00
Copybara-Service
827fec1904
Merge pull request #139 from ufownl:feature/public_layers
...
PiperOrigin-RevId: 623254705
2024-04-09 12:54:23 -07:00
RangerUFO
2099b37732
Change `NumGemmaLayers` and `NumGriffinLayers` to constants in configs
2024-04-09 20:44:41 +08:00
Jan Wassenberg
a982ec1287
Move code to gemma/ so we can remove error-prone copybara: comments.
...
Also fix includes and Lint warnings.
PiperOrigin-RevId: 623127487
2024-04-09 04:45:42 -07:00
zond
9ca662dc14
Clarified README
...
Made it more visible that the recurrent weights are at a different Kaggle page.
2024-04-09 09:58:47 +02:00
Copybara-Service
83dd08ac87
Merge pull request #136 from pculliton:griffin
...
PiperOrigin-RevId: 623054233
2024-04-08 22:29:24 -07:00
Luca Versari
9c3f969405
Implement the Griffin model.
...
Also implement support for some model variations:
- Local attention.
- Add support for biases.
- Use RoPE only on half vectors.
- Support different order of QKV weights.
Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Martin Bruse <zondolfin@gmail.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-08 21:45:54 +02:00
Jan Wassenberg
4326249d0a
Fix includes
...
PiperOrigin-RevId: 622456877
2024-04-06 09:27:09 -07:00
Jan Wassenberg
a3a0f78fda
Merge pull request #131 from veluca93:benchmark-and-test
...
PiperOrigin-RevId: 622452794
2024-04-06 18:06:03 +02:00
Jan Wassenberg
9e51a91cfc
Faster bazel builds by only building all local targets.
...
PiperOrigin-RevId: 622442126
2024-04-06 18:05:49 +02:00
Luca Versari
5862d1f995
Add a benchmark and additional tests.
...
Also add a script to help running sanitizer builds, and do some cleanup.
Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Sami Boukortt <sboukortt@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-06 12:54:52 +02:00
Jan Wassenberg
d852cf5089
Remove unused includes
...
PiperOrigin-RevId: 622412150
2024-04-06 03:13:43 -07:00
Copybara-Service
325ef06cf9
Merge pull request #130 from veluca93:weight-handling
...
PiperOrigin-RevId: 622405491
2024-04-06 02:22:00 -07:00
Luca Versari
4c23932289
Improve weight handling.
...
- Allow scaling of SFP weights
- Allow using uncompressed weights
- Do not try to compress weights in the main model calls
- Reduce code duplication in weight handling with some macros
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Thomas Fischbacher <tfish@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-06 11:08:47 +02:00
Copybara-Service
280b8cb8a1
Merge pull request #129 from veluca93:more-ops
...
PiperOrigin-RevId: 622145499
2024-04-05 05:02:00 -07:00
Luca Versari
6cdb8a45a0
Add more ops: Sigmoid, (Two)MatVecAdd. Faster TwoMatVec.
...
drive-by: some build system simplifications
Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Lode Vandevenne <lode@google.com>
Co-authored-by: Martin Bruse <zondolfin@gmail.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-05 12:27:31 +02:00