gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Zoltan Szabadka	0afa480d90	Use more parallelism in the final output of the attention block. We use MatVec instead of MatVecLoop for the per-head dense layers, because we can parallelize more on the rows of the matrix than on the number of heads. This will be even more efficient after we rearrange the weights and can have a single MatVec operation. Benchmark results (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation): ``` Prefill speed Generation speed Num threads BEFORE AFTER BEFORE AFTER 32 58.24 t/s 61.79 t/s 32.11 t/s 32.62 t/s 64 83.62 t/s 92.00 t/s 41.10 t/s 41.80 t/s ```	2024-05-02 09:30:07 +00:00
Jan Wassenberg	12fb2f05cf	Add per-thread even_odd storage for #166 . Also inline ProjQ and ProjKV lambdas, add missing includes/deps for ops_test. PiperOrigin-RevId: 629460608	2024-04-30 10:42:23 -07:00
Zoltan Szabadka	f8ccb8e37c	Fix kv offset computation for MHA config.	2024-04-30 16:19:14 +00:00
Zoltan Szabadka	afaca4efa8	Use more parallelism in the QKV projections in MQA mode. Instead of MatVecLoop, we use MatVec and we combine k and v into one 2 * kQKVDim long vector so that K and V projections can be combined into one MatVec operation. Benchmark results (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation): ``` Prefill speed Generation speed Num threads BEFORE AFTER BEFORE AFTER 4 9.81 t/s 9.96 t/s 8.39 t/s 8.46 t/s 18 31.50 t/s 36.67 t/s 23.10 t/s 25.83 t/s 32 45.36 t/s 58.91 t/s 27.60 t/s 31.25 t/s 64 57.72 t/s 80.64 t/s 35.40 t/s 39.76 t/s ```	2024-04-30 13:10:14 +00:00
Zoltan Szabadka	27117cc39f	Simplify threading: remove the use of inner_pool. We only used inner_pool in the prefill FFW function, and there we can achieve sufficient parallelism on the rows of the matrix-vector multiplications. Benchmark results on a 1600-token summarization task: ``` Prefill speed Num threads BEFORE AFTER 4 9.24 t/s 9.76 t/s 18 31.41 t/s 31.16 t/s 32 31.41 t/s 45.13 t/s 64 31.03 t/s 57.85 t/s ```	2024-04-29 16:07:30 +00:00
Paul Chang	2d4de6b08b	Support absolute positional embeddings from vanilla transformer PiperOrigin-RevId: 628100831	2024-04-25 09:32:14 -07:00
Paul Chang	75eca87039	Simplify prefill early-exit (originally Merge #156 ) PiperOrigin-RevId: 627788524	2024-04-24 11:11:42 -07:00
Charles Chan	ea45d7c4d7	Use lambda to split function and Make stream_token can break prefill, too	2024-04-23 22:55:01 +08:00
Paul Chang	e8d29792ac	New token validity assertions, improve prompt truncation warning PiperOrigin-RevId: 627376194	2024-04-23 07:05:59 -07:00
Jan Wassenberg	3bf22abb22	Fix sign comparison warnings PiperOrigin-RevId: 627299902	2024-04-23 01:16:51 -07:00
Jan Wassenberg	e9a0caed87	Further improve IO, enable multiple backends without -D. Move Path into io.h and use for opening files. Removes dependency of gemma_lib on args. Separate Windows codepath instead of emulating POSIX functions. Plus lint fixes. PiperOrigin-RevId: 626279004	2024-04-19 00:40:29 -07:00
Paul Chang	38f1ea9b80	Eliminate redundant copies of TokenString() Move this function outside of HWY_NAMESPACE since it doesn't need to be optimized for any particular architecture. PiperOrigin-RevId: 626098641	2024-04-18 11:31:50 -07:00
Jan Wassenberg	a8ceb75f43	Improved IO abstraction layer Move to unique_ptr-like File class. Move `if OS_WIN` into wrapper functions. exists -> Exists. PiperOrigin-RevId: 625923056	2024-04-17 23:15:07 -07:00
Andrey Mikhaylov	4ef3da733a	Fixed minor things and added comments.	2024-04-12 15:39:16 +00:00
Andrey Mikhaylov	03284d752e	Added layers output functionality to gemma and a binary debug_output to save the outputs to a json file.	2024-04-12 15:39:16 +00:00
RangerUFO	e541707caa	Rename the fields of Griffin weights	2024-04-10 21:04:31 +08:00
RangerUFO	4e960d67f6	Fix typos	2024-04-10 20:38:18 +08:00
RangerUFO	809bd0709d	Refactor data structures to reduce memory usage	2024-04-10 19:35:23 +08:00
Jan Wassenberg	881eeffe0a	Lint fixes: strcat, includes, arg naming PiperOrigin-RevId: 623435210	2024-04-10 03:12:41 -07:00
RangerUFO	2099b37732	Change `NumGemmaLayers` and `NumGriffinLayers` to constants in configs	2024-04-09 20:44:41 +08:00
Jan Wassenberg	a982ec1287	Move code to gemma/ so we can remove error-prone copybara: comments. Also fix includes and Lint warnings. PiperOrigin-RevId: 623127487	2024-04-09 04:45:42 -07:00

1 2 3

121 Commits