gemma.cpp

History

Zoltan Szabadka 27117cc39f Simplify threading: remove the use of inner_pool. We only used inner_pool in the prefill FFW function, and there we can achieve sufficient parallelism on the rows of the matrix-vector multiplications. Benchmark results on a 1600-token summarization task: ``` Prefill speed Num threads BEFORE AFTER 4 9.24 t/s 9.76 t/s 18 31.41 t/s 31.16 t/s 32 31.41 t/s 45.13 t/s 64 31.03 t/s 57.85 t/s ```		2024-04-29 16:07:30 +00:00
..
benchmark.cc	Simplify threading: remove the use of inner_pool.	2024-04-29 16:07:30 +00:00
compress_weights.cc	Improve documentation for compress_weights flags	2024-04-29 06:49:50 -07:00
configs.h	Support absolute positional embeddings from vanilla transformer	2024-04-25 09:32:14 -07:00
gemma.cc	Simplify threading: remove the use of inner_pool.	2024-04-29 16:07:30 +00:00
gemma.h	Simplify threading: remove the use of inner_pool.	2024-04-29 16:07:30 +00:00
gemma_test.cc	Simplify threading: remove the use of inner_pool.	2024-04-29 16:07:30 +00:00
ops.h	Move code to gemma/ so we can remove error-prone copybara: comments.	2024-04-09 04:45:42 -07:00
ops_test.cc	Move code to gemma/ so we can remove error-prone copybara: comments.	2024-04-09 04:45:42 -07:00
run.cc	Simplify threading: remove the use of inner_pool.	2024-04-29 16:07:30 +00:00