mirror of https://github.com/google/gemma.cpp.git
We compute all three projections with one MatVec and then copy
the kv part to the cache.
Benchmark results for 7b-it model that uses MHA blocks (summarization with
1600 tokens for prefill and essay writing with 500 tokens for generation):
```
Prefill speed Generation speed
Num threads BEFORE AFTER BEFORE AFTER
32 13.75 t/s 14.80 t/s 9.22 t/s 9.77 t/s
64 19.89 t/s 24.83 t/s 12.46 t/s 13.66 t/s
```
|
||
|---|---|---|
| .. | ||
| benchmark.cc | ||
| compress_weights.cc | ||
| configs.h | ||
| gemma.cc | ||
| gemma.h | ||
| gemma_test.cc | ||
| ops.h | ||
| ops_test.cc | ||
| run.cc | ||