gemma.cpp/gemma
Jan Wassenberg b831fa8482 1.3x prefill, 0.95x decode: matmul replacing last matvec
Before 38.28, 9.17 (with profiler enabled, prompt = 330 tok)
```
Gen.FFW                                 :      15414 x         4692352 = 24.166318
Gen.Attention.SumHeads                  :      15414 x         1394804 =  7.183451 !!
Gen.Embedding                           :        361 x        49961894 =  6.026297
Gen.Attention.QKV                       :      15414 x         1005125 =  5.176546
Gen.Attention.DotSoftmax                :      15414 x          885480 =  4.560357
RopeAndMulBy                            :     696528 x           11867 =  2.761818
```

After 49.80, 8.68
```
Gen.FFW                                 :      14448 x         5312783 = 25.646868
Gen.Embedding                           :        338 x        63044815 =  7.119845
Gen.Attention.QKV                       :      14448 x         1115003 =  5.382557
Gen.Attention.DotSoftmax                :      14448 x          897577 =  4.332957
RopeAndMulBy                            :     673344 x           11886 =  2.674156
Gen.Attention.SumHeads                  :      14448 x          518291 =  2.501993 !!
```
PiperOrigin-RevId: 662024085
2024-08-12 03:36:01 -07:00
..
evals Add MMLU eval to github 2024-05-20 10:20:53 -07:00
instantiations Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B. 2024-08-09 02:09:06 -07:00
activations.h 1.3x prefill, 0.95x decode: matmul replacing last matvec 2024-08-12 03:36:01 -07:00
common.cc Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B. 2024-08-09 02:09:06 -07:00
common.h Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B. 2024-08-09 02:09:06 -07:00
configs.h Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B. 2024-08-09 02:09:06 -07:00
gemma-inl.h 1.3x prefill, 0.95x decode: matmul replacing last matvec 2024-08-12 03:36:01 -07:00
gemma.cc 1.1x prefill speedup, revamp threading in preparation for hierarchical parallelism. 2024-08-05 18:50:09 -07:00
gemma.h 1.3x prefill, 0.95x decode: matmul replacing last matvec 2024-08-12 03:36:01 -07:00
kv_cache.cc Major revamp #2 of Prefill: fix token order, parallel for multi-query 2024-07-25 03:28:55 -07:00
kv_cache.h Major revamp #2 of Prefill: fix token order, parallel for multi-query 2024-07-25 03:28:55 -07:00
run.cc Add pin flag to disable pinning. Refs #338 2024-08-09 13:47:12 -07:00
tokenizer.cc 7x compile time speedup: shard gemma.cc 2024-07-03 06:35:04 -07:00
tokenizer.h 7x compile time speedup: shard gemma.cc 2024-07-03 06:35:04 -07:00
weights.cc 1.3x prefill, 0.95x decode: matmul replacing last matvec 2024-08-12 03:36:01 -07:00
weights.h 1.3x prefill, 0.95x decode: matmul replacing last matvec 2024-08-12 03:36:01 -07:00