gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Ray Smith	e0afdfa8fb	Added bias vector addition to MatMul PiperOrigin-RevId: 643385381	2024-06-14 10:25:16 -07:00
Jan Wassenberg	29c0c574e6	Integrate matmul into FFW: 4.3x prefill speedup ``` before, bf16: 27.2929 prefill tokens / sec 17.2114 tokens / sec after, bf16 116.496 prefill tokens / sec 17.5391 tokens / sec ``` PiperOrigin-RevId: 643328437	2024-06-14 06:32:26 -07:00
Ray Smith	198326a682	Removed now redundant non-batch matmul PiperOrigin-RevId: 643317187	2024-06-14 05:13:36 -07:00
Andrey Vlasov	b17631c95f	Implement a missing (bf16, f32) tiled MatMul kernel. PiperOrigin-RevId: 643313676	2024-06-14 04:54:40 -07:00
Ray Smith	ea525da967	Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time. PiperOrigin-RevId: 643017973	2024-06-13 09:05:40 -07:00
The gemma.cpp Authors	1b40619864	Increase parallelism in ops_test PiperOrigin-RevId: 643013415	2024-06-13 08:50:41 -07:00
Andrey Vlasov	38eb452b94	Support mixed (bf16, sfp) tiled MatMul. Same sfp-decompress strategy as in (f32, sfp) tiled MatMul. PiperOrigin-RevId: 642901844	2024-06-13 02:07:21 -07:00
The gemma.cpp Authors	f467670de7	Implement float * SfpStream matmul by decompressing 4 * kColsA_RowsB -sized chunks of the second matrix. PiperOrigin-RevId: 642533996	2024-06-12 01:11:59 -07:00
Phil Culliton	b6565e3bf6	Update AssertClose for large matrices and add large matrix test PiperOrigin-RevId: 642277221	2024-06-11 08:22:47 -07:00
Phil Culliton	c5bcb5438c	Fix for transpose matrix creation and additional tests PiperOrigin-RevId: 641868053	2024-06-10 05:24:04 -07:00
Phil Culliton	d985d8b867	Shifting large matrix init to heap in ops_test.cc PiperOrigin-RevId: 641311100	2024-06-07 11:38:42 -07:00
Paul Chang	6c0be20fa6	Fix Softmax on SVE PiperOrigin-RevId: 640947138	2024-06-06 10:39:30 -07:00
The gemma.cpp Authors	39d4115717	Implement mixed mode matmul: f32 * bf16 PiperOrigin-RevId: 640940962	2024-06-06 10:21:46 -07:00
Phil Culliton	e71d82ead9	Fix for GenerateZeroMat call in TestTiledMatMul PiperOrigin-RevId: 640180868	2024-06-04 09:32:23 -07:00
Jan Wassenberg	4f9155d8c6	Add bf16 matmul support, update naming+test Avoid int32, which can easily overflow for large matrices. Also fix IDE warning in sfp-inl. PiperOrigin-RevId: 640149845	2024-06-04 07:41:46 -07:00
Phil Culliton	c616abe628	Unrolled / tiled 4x4 MatMul PiperOrigin-RevId: 638384686	2024-05-29 13:02:35 -07:00
Zoltan Szabadka	542ad0973a	Fix normalization in Softmax function.	2024-05-24 08:58:31 +00:00
Jan Wassenberg	f6d02b2870	Fix RecurrentGemma (refs #166 ) - one Dot was ignoring scale. Remove extra Dot() overload MatVecAdd always adds, use MatVecT<kAdd> if conditional. Remove ununsed MatVecAddLoop and MatVecLoop No longer tsan-verify even_odd PiperOrigin-RevId: 631377279	2024-05-07 04:40:42 -07:00
Phil Culliton	28ca001d5e	Matmul and test functions PiperOrigin-RevId: 630373984	2024-05-03 06:39:36 -07:00
Zoltan Szabadka	9a2682d544	Use more parallelism in the QKV projections of the MHA block. We compute all three projections with one MatVec and then copy the kv part to the cache. Benchmark results for 7b-it model that uses MHA blocks (summarization with 1600 tokens for prefill and essay writing with 500 tokens for generation): ``` Prefill speed Generation speed Num threads BEFORE AFTER BEFORE AFTER 32 13.75 t/s 14.80 t/s 9.22 t/s 9.77 t/s 64 19.89 t/s 24.83 t/s 12.46 t/s 13.66 t/s ```	2024-05-02 13:46:45 +00:00
Jan Wassenberg	12fb2f05cf	Add per-thread even_odd storage for #166 . Also inline ProjQ and ProjKV lambdas, add missing includes/deps for ops_test. PiperOrigin-RevId: 629460608	2024-04-30 10:42:23 -07:00
Jan Wassenberg	a982ec1287	Move code to gemma/ so we can remove error-prone copybara: comments. Also fix includes and Lint warnings. PiperOrigin-RevId: 623127487	2024-04-09 04:45:42 -07:00

22 Commits