gemma.cpp/gemma
Zoltan Szabadka 27117cc39f Simplify threading: remove the use of inner_pool.
We only used inner_pool in the prefill FFW function, and there we
can achieve sufficient parallelism on the rows of the matrix-vector
multiplications.

Benchmark results on a 1600-token summarization task:

```
               Prefill speed
Num threads    BEFORE         AFTER
4               9.24 t/s       9.76 t/s
18             31.41 t/s      31.16 t/s
32             31.41 t/s      45.13 t/s
64             31.03 t/s      57.85 t/s
```
2024-04-29 16:07:30 +00:00
..
benchmark.cc Simplify threading: remove the use of inner_pool. 2024-04-29 16:07:30 +00:00
compress_weights.cc Improve documentation for compress_weights flags 2024-04-29 06:49:50 -07:00
configs.h Support absolute positional embeddings from vanilla transformer 2024-04-25 09:32:14 -07:00
gemma.cc Simplify threading: remove the use of inner_pool. 2024-04-29 16:07:30 +00:00
gemma.h Simplify threading: remove the use of inner_pool. 2024-04-29 16:07:30 +00:00
gemma_test.cc Simplify threading: remove the use of inner_pool. 2024-04-29 16:07:30 +00:00
ops.h Move code to gemma/ so we can remove error-prone copybara: comments. 2024-04-09 04:45:42 -07:00
ops_test.cc Move code to gemma/ so we can remove error-prone copybara: comments. 2024-04-09 04:45:42 -07:00
run.cc Simplify threading: remove the use of inner_pool. 2024-04-29 16:07:30 +00:00