Copybara-Service
bafb8382f8
Merge pull request #175 from szabadka:gemma2
...
PiperOrigin-RevId: 630044058
2024-05-02 06:27:15 -07:00
Zoltan Szabadka
0afa480d90
Use more parallelism in the final output of the attention block.
...
We use MatVec instead of MatVecLoop for the per-head dense layers,
because we can parallelize more on the rows of the matrix than
on the number of heads. This will be even more efficient after
we rearrange the weights and can have a single MatVec operation.
Benchmark results (summarization with 1600 tokens for prefill
and essay writing with 500 tokens for generation):
```
Prefill speed Generation speed
Num threads BEFORE AFTER BEFORE AFTER
32 58.24 t/s 61.79 t/s 32.11 t/s 32.62 t/s
64 83.62 t/s 92.00 t/s 41.10 t/s 41.80 t/s
```
2024-05-02 09:30:07 +00:00
Sam Kaufman
4a6173d929
Remove unused vars.
2024-05-02 00:41:44 -07:00
Sam Kaufman
564937ede6
Merge branch 'dev' into deinterleave-vecs
2024-04-30 16:23:04 -07:00
Sam Kaufman
2829ef17ad
Check for HWY_NATIVE_DOT_BF16.
2024-04-30 15:19:28 -07:00
Sam Kaufman
59ebecce22
Fix: specialized MatVecAdd was never called.
2024-04-30 15:17:27 -07:00
Jan Wassenberg
12fb2f05cf
Add per-thread even_odd storage for #166 .
...
Also inline ProjQ and ProjKV lambdas,
add missing includes/deps for ops_test.
PiperOrigin-RevId: 629460608
2024-04-30 10:42:23 -07:00
Copybara-Service
8f04a8346d
Merge pull request #172 from szabadka:gemma2
...
PiperOrigin-RevId: 629438917
2024-04-30 09:33:38 -07:00
Zoltan Szabadka
f8ccb8e37c
Fix kv offset computation for MHA config.
2024-04-30 16:19:14 +00:00
Copybara-Service
374fd7478a
Merge pull request #170 from szabadka:gemma2
...
PiperOrigin-RevId: 629408279
2024-04-30 07:40:30 -07:00
Zoltan Szabadka
afaca4efa8
Use more parallelism in the QKV projections in MQA mode.
...
Instead of MatVecLoop, we use MatVec and we combine k and v
into one 2 * kQKVDim long vector so that K and V projections
can be combined into one MatVec operation.
Benchmark results (summarization with 1600 tokens for prefill
and essay writing with 500 tokens for generation):
```
Prefill speed Generation speed
Num threads BEFORE AFTER BEFORE AFTER
4 9.81 t/s 9.96 t/s 8.39 t/s 8.46 t/s
18 31.50 t/s 36.67 t/s 23.10 t/s 25.83 t/s
32 45.36 t/s 58.91 t/s 27.60 t/s 31.25 t/s
64 57.72 t/s 80.64 t/s 35.40 t/s 39.76 t/s
```
2024-04-30 13:10:14 +00:00
Copybara-Service
befe9fb07e
Merge pull request #167 from szabadka:gemma2
...
PiperOrigin-RevId: 629325219
2024-04-30 01:00:37 -07:00
Sam Kaufman
6a78a23f4c
Abstracted some MatVecAdd spec. dupes.
2024-04-29 16:23:38 -07:00
Sam Kaufman
f608337fef
Remove Bf16ToF32EO and use PromoteEvenTo and PromoteOddTo.
2024-04-29 14:13:07 -07:00
Sam Kaufman
aa0b113214
(VecT*) to static_cast<VecT*>.
2024-04-29 12:53:47 -07:00
Sam Kaufman
5cb63346aa
supports_eo -> kSupportsEvenOdd
2024-04-29 12:51:35 -07:00
Zoltan Szabadka
27117cc39f
Simplify threading: remove the use of inner_pool.
...
We only used inner_pool in the prefill FFW function, and there we
can achieve sufficient parallelism on the rows of the matrix-vector
multiplications.
Benchmark results on a 1600-token summarization task:
```
Prefill speed
Num threads BEFORE AFTER
4 9.24 t/s 9.76 t/s
18 31.41 t/s 31.16 t/s
32 31.41 t/s 45.13 t/s
64 31.03 t/s 57.85 t/s
```
2024-04-29 16:07:30 +00:00
Paul Chang
1d18c5a129
Improve documentation for compress_weights flags
...
PiperOrigin-RevId: 629053191
2024-04-29 06:49:50 -07:00
Sam Kaufman
0816a1070d
Even-odd layout MatVecs for bf16 weights.
2024-04-28 20:09:25 -07:00
Jan Wassenberg
7a12e29027
Add error-checking for py binding, add missing include+hwasan check
...
PiperOrigin-RevId: 628453112
2024-04-26 10:59:41 -07:00
Paul Chang
e8f59bb411
Fix underflow in NUQ ClusterCost()
...
PiperOrigin-RevId: 628137904
2024-04-25 11:28:51 -07:00
Phil Culliton
9e0ac5de34
Update Clif wrapper to work with latest gemma.cpp and add simple example
...
PiperOrigin-RevId: 628134201
2024-04-25 11:17:16 -07:00
Paul Chang
2d4de6b08b
Support absolute positional embeddings from vanilla transformer
...
PiperOrigin-RevId: 628100831
2024-04-25 09:32:14 -07:00
Paul Chang
75eca87039
Simplify prefill early-exit (originally Merge #156 )
...
PiperOrigin-RevId: 627788524
2024-04-24 11:11:42 -07:00
Copybara-Service
b27d8d6b92
Merge pull request #156 from zeerd:dev
...
PiperOrigin-RevId: 627706909
2024-04-24 06:19:14 -07:00
Charles Chan
ea45d7c4d7
Use lambda to split function and Make stream_token can break prefill, too
2024-04-23 22:55:01 +08:00
Paul Chang
e8d29792ac
New token validity assertions, improve prompt truncation warning
...
PiperOrigin-RevId: 627376194
2024-04-23 07:05:59 -07:00
Jan Wassenberg
3bf22abb22
Fix sign comparison warnings
...
PiperOrigin-RevId: 627299902
2024-04-23 01:16:51 -07:00
Jan Wassenberg
ca971ef50f
Document weight conversion
...
PiperOrigin-RevId: 626957718
2024-04-22 01:58:30 -07:00
Jan Wassenberg
e9a0caed87
Further improve IO, enable multiple backends without -D.
...
Move Path into io.h and use for opening files.
Removes dependency of gemma_lib on args.
Separate Windows codepath instead of emulating POSIX functions.
Plus lint fixes.
PiperOrigin-RevId: 626279004
2024-04-19 00:40:29 -07:00
Paul Chang
38f1ea9b80
Eliminate redundant copies of TokenString()
...
Move this function outside of HWY_NAMESPACE since it doesn't need to be
optimized for any particular architecture.
PiperOrigin-RevId: 626098641
2024-04-18 11:31:50 -07:00
Jan Wassenberg
a8ceb75f43
Improved IO abstraction layer
...
Move to unique_ptr-like File class.
Move `if OS_WIN` into wrapper functions.
exists -> Exists.
PiperOrigin-RevId: 625923056
2024-04-17 23:15:07 -07:00
Jan Wassenberg
a939b5fc9f
Update distortion.h to weighted average, add distortion_test.
...
More thorough checks in sfp_test and nuq_test.
nuq_test: use deterministic input generator.
PiperOrigin-RevId: 625602019
2024-04-17 01:44:19 -07:00
Copybara-Service
05e7e2b2bb
Merge pull request #145 from atorero:dev
...
PiperOrigin-RevId: 624221085
2024-04-12 10:27:18 -07:00
Andrey Mikhaylov
4ef3da733a
Fixed minor things and added comments.
2024-04-12 15:39:16 +00:00
Andrey Mikhaylov
2c5706f159
Add comments regarding layers output usage.
2024-04-12 15:39:16 +00:00
Andrey Mikhaylov
03284d752e
Added layers output functionality to gemma and a binary debug_output to save the outputs to a json file.
2024-04-12 15:39:16 +00:00
Copybara-Service
342e998cb6
Merge pull request #142 from ufownl:refactor/data_structures
...
PiperOrigin-RevId: 623503486
2024-04-10 08:35:18 -07:00
RangerUFO
e541707caa
Rename the fields of Griffin weights
2024-04-10 21:04:31 +08:00
RangerUFO
4e960d67f6
Fix typos
2024-04-10 20:38:18 +08:00
RangerUFO
809bd0709d
Refactor data structures to reduce memory usage
2024-04-10 19:35:23 +08:00
Jan Wassenberg
54120a5571
Mention Makefile contributed by @jart
...
PiperOrigin-RevId: 623436818
2024-04-10 03:21:10 -07:00
Jan Wassenberg
881eeffe0a
Lint fixes: strcat, includes, arg naming
...
PiperOrigin-RevId: 623435210
2024-04-10 03:12:41 -07:00
Copybara-Service
da91f4c4be
Merge pull request #137 from zond:main
...
PiperOrigin-RevId: 623255639
2024-04-09 12:57:57 -07:00
Copybara-Service
827fec1904
Merge pull request #139 from ufownl:feature/public_layers
...
PiperOrigin-RevId: 623254705
2024-04-09 12:54:23 -07:00
RangerUFO
2099b37732
Change `NumGemmaLayers` and `NumGriffinLayers` to constants in configs
2024-04-09 20:44:41 +08:00
Jan Wassenberg
a982ec1287
Move code to gemma/ so we can remove error-prone copybara: comments.
...
Also fix includes and Lint warnings.
PiperOrigin-RevId: 623127487
2024-04-09 04:45:42 -07:00
zond
9ca662dc14
Clarified README
...
Made it more visible that the recurrent weights are at a different Kaggle page.
2024-04-09 09:58:47 +02:00
Copybara-Service
83dd08ac87
Merge pull request #136 from pculliton:griffin
...
PiperOrigin-RevId: 623054233
2024-04-08 22:29:24 -07:00
Luca Versari
9c3f969405
Implement the Griffin model.
...
Also implement support for some model variations:
- Local attention.
- Add support for biases.
- Use RoPE only on half vectors.
- Support different order of QKV weights.
Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Martin Bruse <zondolfin@gmail.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-08 21:45:54 +02:00