Commit Graph

424 Commits

Author SHA1 Message Date
Sam Kaufman 6a78a23f4c Abstracted some MatVecAdd spec. dupes. 2024-04-29 16:23:38 -07:00
Sam Kaufman f608337fef Remove Bf16ToF32EO and use PromoteEvenTo and PromoteOddTo. 2024-04-29 14:13:07 -07:00
Sam Kaufman aa0b113214 (VecT*) to static_cast<VecT*>. 2024-04-29 12:53:47 -07:00
Sam Kaufman 5cb63346aa supports_eo -> kSupportsEvenOdd 2024-04-29 12:51:35 -07:00
Zoltan Szabadka 27117cc39f Simplify threading: remove the use of inner_pool.
We only used inner_pool in the prefill FFW function, and there we
can achieve sufficient parallelism on the rows of the matrix-vector
multiplications.

Benchmark results on a 1600-token summarization task:

```
               Prefill speed
Num threads    BEFORE         AFTER
4               9.24 t/s       9.76 t/s
18             31.41 t/s      31.16 t/s
32             31.41 t/s      45.13 t/s
64             31.03 t/s      57.85 t/s
```
2024-04-29 16:07:30 +00:00
Paul Chang 1d18c5a129 Improve documentation for compress_weights flags
PiperOrigin-RevId: 629053191
2024-04-29 06:49:50 -07:00
Sam Kaufman 0816a1070d Even-odd layout MatVecs for bf16 weights. 2024-04-28 20:09:25 -07:00
Paul Chang 2d4de6b08b Support absolute positional embeddings from vanilla transformer
PiperOrigin-RevId: 628100831
2024-04-25 09:32:14 -07:00
Paul Chang 75eca87039 Simplify prefill early-exit (originally Merge #156)
PiperOrigin-RevId: 627788524
2024-04-24 11:11:42 -07:00
Charles Chan ea45d7c4d7 Use lambda to split function and Make stream_token can break prefill, too 2024-04-23 22:55:01 +08:00
Paul Chang e8d29792ac New token validity assertions, improve prompt truncation warning
PiperOrigin-RevId: 627376194
2024-04-23 07:05:59 -07:00
Jan Wassenberg 3bf22abb22 Fix sign comparison warnings
PiperOrigin-RevId: 627299902
2024-04-23 01:16:51 -07:00
Jan Wassenberg e9a0caed87 Further improve IO, enable multiple backends without -D.
Move Path into io.h and use for opening files.
Removes dependency of gemma_lib on args.
Separate Windows codepath instead of emulating POSIX functions.

Plus lint fixes.

PiperOrigin-RevId: 626279004
2024-04-19 00:40:29 -07:00
Paul Chang 38f1ea9b80 Eliminate redundant copies of TokenString()
Move this function outside of HWY_NAMESPACE since it doesn't need to be
optimized for any particular architecture.

PiperOrigin-RevId: 626098641
2024-04-18 11:31:50 -07:00
Jan Wassenberg a8ceb75f43 Improved IO abstraction layer
Move to unique_ptr-like File class.
Move `if OS_WIN` into wrapper functions.
exists -> Exists.

PiperOrigin-RevId: 625923056
2024-04-17 23:15:07 -07:00
Andrey Mikhaylov 4ef3da733a Fixed minor things and added comments. 2024-04-12 15:39:16 +00:00
Andrey Mikhaylov 2c5706f159 Add comments regarding layers output usage. 2024-04-12 15:39:16 +00:00
Andrey Mikhaylov 03284d752e Added layers output functionality to gemma and a binary debug_output to save the outputs to a json file. 2024-04-12 15:39:16 +00:00
RangerUFO e541707caa Rename the fields of Griffin weights 2024-04-10 21:04:31 +08:00
RangerUFO 4e960d67f6 Fix typos 2024-04-10 20:38:18 +08:00
RangerUFO 809bd0709d Refactor data structures to reduce memory usage 2024-04-10 19:35:23 +08:00
Jan Wassenberg 881eeffe0a Lint fixes: strcat, includes, arg naming
PiperOrigin-RevId: 623435210
2024-04-10 03:12:41 -07:00
RangerUFO 2099b37732 Change `NumGemmaLayers` and `NumGriffinLayers` to constants in configs 2024-04-09 20:44:41 +08:00
Jan Wassenberg a982ec1287 Move code to gemma/ so we can remove error-prone copybara: comments.
Also fix includes and Lint warnings.

PiperOrigin-RevId: 623127487
2024-04-09 04:45:42 -07:00