Commit Graph

62 Commits

Author SHA1 Message Date
Apoorv Reddy 7f4b85d00b Add MMLU eval to github
PiperOrigin-RevId: 635495178
2024-05-20 10:20:53 -07:00
Apoorv Reddy 8e641eb4cd Add TTFT to TimingInfo
PiperOrigin-RevId: 634378994
2024-05-16 07:16:53 -07:00
Apoorv Reddy eb0b96e0a8 Pass most runtime parameters using const RuntimeConfig&
PiperOrigin-RevId: 633572507
2024-05-14 07:04:53 -07:00
Apoorv Reddy f1eab987d8 Store tokens/sec in auxiliary struct TimingInfo.
PiperOrigin-RevId: 633108908
2024-05-13 00:04:19 -07:00
Zoltan Szabadka afaca4efa8 Use more parallelism in the QKV projections in MQA mode.
Instead of MatVecLoop, we use MatVec and we combine k and v
into one 2 * kQKVDim long vector so that K and V projections
can be combined into one MatVec operation.

Benchmark results (summarization with 1600 tokens for prefill
and essay writing with 500 tokens for generation):

```
                   Prefill speed                Generation speed
Num threads      BEFORE       AFTER            BEFORE       AFTER
4                 9.81 t/s     9.96 t/s       8.39 t/s     8.46 t/s
18               31.50 t/s    36.67 t/s      23.10 t/s    25.83 t/s
32               45.36 t/s    58.91 t/s      27.60 t/s    31.25 t/s
64               57.72 t/s    80.64 t/s      35.40 t/s    39.76 t/s
```
2024-04-30 13:10:14 +00:00
Zoltan Szabadka 27117cc39f Simplify threading: remove the use of inner_pool.
We only used inner_pool in the prefill FFW function, and there we
can achieve sufficient parallelism on the rows of the matrix-vector
multiplications.

Benchmark results on a 1600-token summarization task:

```
               Prefill speed
Num threads    BEFORE         AFTER
4               9.24 t/s       9.76 t/s
18             31.41 t/s      31.16 t/s
32             31.41 t/s      45.13 t/s
64             31.03 t/s      57.85 t/s
```
2024-04-29 16:07:30 +00:00
Jan Wassenberg e9a0caed87 Further improve IO, enable multiple backends without -D.
Move Path into io.h and use for opening files.
Removes dependency of gemma_lib on args.
Separate Windows codepath instead of emulating POSIX functions.

Plus lint fixes.

PiperOrigin-RevId: 626279004
2024-04-19 00:40:29 -07:00
Andrey Mikhaylov 4ef3da733a Fixed minor things and added comments. 2024-04-12 15:39:16 +00:00
Andrey Mikhaylov 2c5706f159 Add comments regarding layers output usage. 2024-04-12 15:39:16 +00:00
Andrey Mikhaylov 03284d752e Added layers output functionality to gemma and a binary debug_output to save the outputs to a json file. 2024-04-12 15:39:16 +00:00
RangerUFO 2099b37732 Change `NumGemmaLayers` and `NumGriffinLayers` to constants in configs 2024-04-09 20:44:41 +08:00
Jan Wassenberg a982ec1287 Move code to gemma/ so we can remove error-prone copybara: comments.
Also fix includes and Lint warnings.

PiperOrigin-RevId: 623127487
2024-04-09 04:45:42 -07:00
Renamed from gemma.h (Browse further)