mirror of https://github.com/google/gemma.cpp.git
We only used inner_pool in the prefill FFW function, and there we
can achieve sufficient parallelism on the rows of the matrix-vector
multiplications.
Benchmark results on a 1600-token summarization task:
```
Prefill speed
Num threads BEFORE AFTER
4 9.24 t/s 9.76 t/s
18 31.41 t/s 31.16 t/s
32 31.41 t/s 45.13 t/s
64 31.03 t/s 57.85 t/s
```
|
||
|---|---|---|
| .. | ||
| benchmark.cc | ||
| compress_weights.cc | ||
| configs.h | ||
| gemma.cc | ||
| gemma.h | ||
| gemma_test.cc | ||
| ops.h | ||
| ops_test.cc | ||
| run.cc | ||