Commit Graph

621 Commits

Author SHA1 Message Date
Copybara-Service 374fd7478a Merge pull request #170 from szabadka:gemma2
PiperOrigin-RevId: 629408279
2024-04-30 07:40:30 -07:00
Zoltan Szabadka afaca4efa8 Use more parallelism in the QKV projections in MQA mode.
Instead of MatVecLoop, we use MatVec and we combine k and v
into one 2 * kQKVDim long vector so that K and V projections
can be combined into one MatVec operation.

Benchmark results (summarization with 1600 tokens for prefill
and essay writing with 500 tokens for generation):

```
                   Prefill speed                Generation speed
Num threads      BEFORE       AFTER            BEFORE       AFTER
4                 9.81 t/s     9.96 t/s       8.39 t/s     8.46 t/s
18               31.50 t/s    36.67 t/s      23.10 t/s    25.83 t/s
32               45.36 t/s    58.91 t/s      27.60 t/s    31.25 t/s
64               57.72 t/s    80.64 t/s      35.40 t/s    39.76 t/s
```
2024-04-30 13:10:14 +00:00
Copybara-Service befe9fb07e Merge pull request #167 from szabadka:gemma2
PiperOrigin-RevId: 629325219
2024-04-30 01:00:37 -07:00
Sam Kaufman 6a78a23f4c Abstracted some MatVecAdd spec. dupes. 2024-04-29 16:23:38 -07:00
Sam Kaufman f608337fef Remove Bf16ToF32EO and use PromoteEvenTo and PromoteOddTo. 2024-04-29 14:13:07 -07:00
Sam Kaufman aa0b113214 (VecT*) to static_cast<VecT*>. 2024-04-29 12:53:47 -07:00
Sam Kaufman 5cb63346aa supports_eo -> kSupportsEvenOdd 2024-04-29 12:51:35 -07:00
Zoltan Szabadka 27117cc39f Simplify threading: remove the use of inner_pool.
We only used inner_pool in the prefill FFW function, and there we
can achieve sufficient parallelism on the rows of the matrix-vector
multiplications.

Benchmark results on a 1600-token summarization task:

```
               Prefill speed
Num threads    BEFORE         AFTER
4               9.24 t/s       9.76 t/s
18             31.41 t/s      31.16 t/s
32             31.41 t/s      45.13 t/s
64             31.03 t/s      57.85 t/s
```
2024-04-29 16:07:30 +00:00
Paul Chang 1d18c5a129 Improve documentation for compress_weights flags
PiperOrigin-RevId: 629053191
2024-04-29 06:49:50 -07:00
Sam Kaufman 0816a1070d Even-odd layout MatVecs for bf16 weights. 2024-04-28 20:09:25 -07:00
Jan Wassenberg 7a12e29027 Add error-checking for py binding, add missing include+hwasan check
PiperOrigin-RevId: 628453112
2024-04-26 10:59:41 -07:00
Paul Chang e8f59bb411 Fix underflow in NUQ ClusterCost()
PiperOrigin-RevId: 628137904
2024-04-25 11:28:51 -07:00
Phil Culliton 9e0ac5de34 Update Clif wrapper to work with latest gemma.cpp and add simple example
PiperOrigin-RevId: 628134201
2024-04-25 11:17:16 -07:00
Paul Chang 2d4de6b08b Support absolute positional embeddings from vanilla transformer
PiperOrigin-RevId: 628100831
2024-04-25 09:32:14 -07:00
Paul Chang 75eca87039 Simplify prefill early-exit (originally Merge #156)
PiperOrigin-RevId: 627788524
2024-04-24 11:11:42 -07:00
Copybara-Service b27d8d6b92 Merge pull request #156 from zeerd:dev
PiperOrigin-RevId: 627706909
2024-04-24 06:19:14 -07:00
Charles Chan ea45d7c4d7 Use lambda to split function and Make stream_token can break prefill, too 2024-04-23 22:55:01 +08:00
Paul Chang e8d29792ac New token validity assertions, improve prompt truncation warning
PiperOrigin-RevId: 627376194
2024-04-23 07:05:59 -07:00
Jan Wassenberg 3bf22abb22 Fix sign comparison warnings
PiperOrigin-RevId: 627299902
2024-04-23 01:16:51 -07:00
Jan Wassenberg ca971ef50f Document weight conversion
PiperOrigin-RevId: 626957718
2024-04-22 01:58:30 -07:00
Jan Wassenberg e9a0caed87 Further improve IO, enable multiple backends without -D.
Move Path into io.h and use for opening files.
Removes dependency of gemma_lib on args.
Separate Windows codepath instead of emulating POSIX functions.

Plus lint fixes.

PiperOrigin-RevId: 626279004
2024-04-19 00:40:29 -07:00
Paul Chang 38f1ea9b80 Eliminate redundant copies of TokenString()
Move this function outside of HWY_NAMESPACE since it doesn't need to be
optimized for any particular architecture.

PiperOrigin-RevId: 626098641
2024-04-18 11:31:50 -07:00
Jan Wassenberg a8ceb75f43 Improved IO abstraction layer
Move to unique_ptr-like File class.
Move `if OS_WIN` into wrapper functions.
exists -> Exists.

PiperOrigin-RevId: 625923056
2024-04-17 23:15:07 -07:00
Jan Wassenberg a939b5fc9f Update distortion.h to weighted average, add distortion_test.
More thorough checks in sfp_test and nuq_test.
nuq_test: use deterministic input generator.

PiperOrigin-RevId: 625602019
2024-04-17 01:44:19 -07:00
Copybara-Service 05e7e2b2bb Merge pull request #145 from atorero:dev
PiperOrigin-RevId: 624221085
2024-04-12 10:27:18 -07:00
Andrey Mikhaylov 4ef3da733a Fixed minor things and added comments. 2024-04-12 15:39:16 +00:00
Andrey Mikhaylov 2c5706f159 Add comments regarding layers output usage. 2024-04-12 15:39:16 +00:00
Andrey Mikhaylov 03284d752e Added layers output functionality to gemma and a binary debug_output to save the outputs to a json file. 2024-04-12 15:39:16 +00:00
Copybara-Service 342e998cb6 Merge pull request #142 from ufownl:refactor/data_structures
PiperOrigin-RevId: 623503486
2024-04-10 08:35:18 -07:00
RangerUFO e541707caa Rename the fields of Griffin weights 2024-04-10 21:04:31 +08:00
RangerUFO 4e960d67f6 Fix typos 2024-04-10 20:38:18 +08:00
RangerUFO 809bd0709d Refactor data structures to reduce memory usage 2024-04-10 19:35:23 +08:00
Jan Wassenberg 54120a5571 Mention Makefile contributed by @jart
PiperOrigin-RevId: 623436818
2024-04-10 03:21:10 -07:00
Jan Wassenberg 881eeffe0a Lint fixes: strcat, includes, arg naming
PiperOrigin-RevId: 623435210
2024-04-10 03:12:41 -07:00
Copybara-Service da91f4c4be Merge pull request #137 from zond:main
PiperOrigin-RevId: 623255639
2024-04-09 12:57:57 -07:00
Copybara-Service 827fec1904 Merge pull request #139 from ufownl:feature/public_layers
PiperOrigin-RevId: 623254705
2024-04-09 12:54:23 -07:00
RangerUFO 2099b37732 Change `NumGemmaLayers` and `NumGriffinLayers` to constants in configs 2024-04-09 20:44:41 +08:00
Jan Wassenberg a982ec1287 Move code to gemma/ so we can remove error-prone copybara: comments.
Also fix includes and Lint warnings.

PiperOrigin-RevId: 623127487
2024-04-09 04:45:42 -07:00
zond 9ca662dc14
Clarified README
Made it more visible that the recurrent weights are at a different Kaggle page.
2024-04-09 09:58:47 +02:00
Copybara-Service 83dd08ac87 Merge pull request #136 from pculliton:griffin
PiperOrigin-RevId: 623054233
2024-04-08 22:29:24 -07:00
Luca Versari 9c3f969405 Implement the Griffin model.
Also implement support for some model variations:

- Local attention.
- Add support for biases.
- Use RoPE only on half vectors.
- Support different order of QKV weights.

Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Martin Bruse <zondolfin@gmail.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-08 21:45:54 +02:00
Jan Wassenberg 4326249d0a Fix includes
PiperOrigin-RevId: 622456877
2024-04-06 09:27:09 -07:00
Jan Wassenberg a3a0f78fda Merge pull request #131 from veluca93:benchmark-and-test
PiperOrigin-RevId: 622452794
2024-04-06 18:06:03 +02:00
Jan Wassenberg 9e51a91cfc Faster bazel builds by only building all local targets.
PiperOrigin-RevId: 622442126
2024-04-06 18:05:49 +02:00
Luca Versari 5862d1f995 Add a benchmark and additional tests.
Also add a script to help running sanitizer builds, and do some cleanup.

Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Sami Boukortt <sboukortt@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-06 12:54:52 +02:00
Jan Wassenberg d852cf5089 Remove unused includes
PiperOrigin-RevId: 622412150
2024-04-06 03:13:43 -07:00
Copybara-Service 325ef06cf9 Merge pull request #130 from veluca93:weight-handling
PiperOrigin-RevId: 622405491
2024-04-06 02:22:00 -07:00
Luca Versari 4c23932289 Improve weight handling.
- Allow scaling of SFP weights
- Allow using uncompressed weights
- Do not try to compress weights in the main model calls
- Reduce code duplication in weight handling with some macros

Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Thomas Fischbacher <tfish@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-06 11:08:47 +02:00
Copybara-Service 280b8cb8a1 Merge pull request #129 from veluca93:more-ops
PiperOrigin-RevId: 622145499
2024-04-05 05:02:00 -07:00
Luca Versari 6cdb8a45a0 Add more ops: Sigmoid, (Two)MatVecAdd. Faster TwoMatVec.
drive-by: some build system simplifications

Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Lode Vandevenne <lode@google.com>
Co-authored-by: Martin Bruse <zondolfin@gmail.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-05 12:27:31 +02:00