gemma.cpp

Commit Graph

Author	SHA1	Message	Date
The gemma.cpp Authors	0e612d9a20	Split out common parts (embedder and transformer block) from Prefill() and Transformer() into separate functions. PiperOrigin-RevId: 644455520	2024-06-18 11:24:56 -07:00
Paul Chang	d7d9d14f0e	Move kGriffinLayers into ConfigNoSSM, set kGemmaLayers directly For regular (non-SSM) Gemma models, kGriffinLayers is by definition always zero and kGemmaLayers is just the number of layers. PiperOrigin-RevId: 644384531	2024-06-18 07:52:52 -07:00
Jan Wassenberg	70506b0a62	Fix debug_prompt and other binaries (internal init) PiperOrigin-RevId: 644367683	2024-06-18 06:48:59 -07:00
Jan Wassenberg	15135f5b3d	Simplify Attention. Shared kMHA, reuse from Activations, inline Attn lambda, use QDim as the stride between successive Q. PiperOrigin-RevId: 644343854	2024-06-18 05:08:12 -07:00
Jan Wassenberg	2ac47e4a06	Fix Py binding/run_example: use GemmaEnv PiperOrigin-RevId: 644318962	2024-06-18 03:20:22 -07:00
Jan Wassenberg	a07f60c9a1	1.15x 7b sfp prefill speedup: Matmul in attention 2b bf16: prefill 114.456 -> 115.222 decode 16.8847 -> 16.9987 7b sfp: prefill 18.8575 -> 21.7325 decode 5.68428 -> 5.79791 PiperOrigin-RevId: 644283676	2024-06-18 01:00:51 -07:00
Jan Wassenberg	355f7b4f80	Update developer docs and mention asan/msan PiperOrigin-RevId: 644000220	2024-06-17 07:29:12 -07:00
Jan Wassenberg	704d936764	Further simplification to ForEachTensor, thanks I.K. PiperOrigin-RevId: 643996210	2024-06-17 07:12:26 -07:00
Jan Wassenberg	7d0720675f	Move raw_weights into separate header, used mainly by compress_weights. Fix warnings in backprop/* (include) PiperOrigin-RevId: 643983136	2024-06-17 06:17:02 -07:00
Jan Wassenberg	ad790d89d1	Fix DASSERT - TiledBatch requires at least 2 vectors. Also use shorthand for weight types. PiperOrigin-RevId: 643958371	2024-06-17 04:29:01 -07:00
The gemma.cpp Authors	7dbfa44794	Refactor CompressedWeights. PiperOrigin-RevId: 643934198	2024-06-17 02:54:54 -07:00
Ray Smith	e0afdfa8fb	Added bias vector addition to MatMul PiperOrigin-RevId: 643385381	2024-06-14 10:25:16 -07:00
The gemma.cpp Authors	2228055bb8	Internal change. PiperOrigin-RevId: 643330703	2024-06-14 06:53:41 -07:00
Jan Wassenberg	29c0c574e6	Integrate matmul into FFW: 4.3x prefill speedup ``` before, bf16: 27.2929 prefill tokens / sec 17.2114 tokens / sec after, bf16 116.496 prefill tokens / sec 17.5391 tokens / sec ``` PiperOrigin-RevId: 643328437	2024-06-14 06:32:26 -07:00
Ray Smith	198326a682	Removed now redundant non-batch matmul PiperOrigin-RevId: 643317187	2024-06-14 05:13:36 -07:00
Andrey Vlasov	b17631c95f	Implement a missing (bf16, f32) tiled MatMul kernel. PiperOrigin-RevId: 643313676	2024-06-14 04:54:40 -07:00
Jan Wassenberg	d3c6a45b59	Major duplicated code reduction in test/benchmarks Helper functions to tokenize/wrap Move LayersOutputFunc into RuntimeConfig AcceptFunc passes the probability Implement StringFromType using the parser, and verify results match PiperOrigin-RevId: 643255119	2024-06-14 00:16:25 -07:00
Jan Wassenberg	c15ff9529c	Reduce duplication in Config* by inheriting no-SSM PiperOrigin-RevId: 643030629	2024-06-13 09:48:56 -07:00
Ray Smith	ea525da967	Added MatMul_4x4_Batch which is MatMul_4x4, but with the first template arg moved to the first function arg, so the batch size (num A rows) can be variable at run-time. PiperOrigin-RevId: 643017973	2024-06-13 09:05:40 -07:00
The gemma.cpp Authors	1b40619864	Increase parallelism in ops_test PiperOrigin-RevId: 643013415	2024-06-13 08:50:41 -07:00
Andrey Vlasov	bf78a065e1	Make gemma/ops_test `large`. PiperOrigin-RevId: 642923146	2024-06-13 03:33:46 -07:00
Andrey Vlasov	38eb452b94	Support mixed (bf16, sfp) tiled MatMul. Same sfp-decompress strategy as in (f32, sfp) tiled MatMul. PiperOrigin-RevId: 642901844	2024-06-13 02:07:21 -07:00
Daniel Keysers	6e67a6d8a9	Tiny cleanup: distinguish between "ids" and "pieces" in argument names when encoding. PiperOrigin-RevId: 642614278	2024-06-12 07:52:13 -07:00
Daniel Keysers	1ac9857014	Extends Transformer() to prepare for batched processing. PiperOrigin-RevId: 642603025	2024-06-12 07:01:03 -07:00
The gemma.cpp Authors	2a0e6ee976	Fix numerical issue in Softcap by subtracting max. Also update test threshold. PiperOrigin-RevId: 642587468	2024-06-12 05:42:16 -07:00
Copybara-Service	e37447cfe2	Merge pull request #234 from szabadka:build-fix PiperOrigin-RevId: 642551103	2024-06-12 02:29:21 -07:00
Zoltan Szabadka	d98523187c	Add benchmark dependency to cmake build.	2024-06-12 08:14:29 +00:00
The gemma.cpp Authors	f467670de7	Implement float * SfpStream matmul by decompressing 4 * kColsA_RowsB -sized chunks of the second matrix. PiperOrigin-RevId: 642533996	2024-06-12 01:11:59 -07:00
Zoltan Szabadka	9c869c4655	Revert "Add benchmark dependency to cmake build" This reverts commit `12ce91a163`. Reason: accidentally pushed directly to dev branch, will redo with a PR and copybara-import.	2024-06-12 07:56:03 +00:00
Zoltan Szabadka	12ce91a163	Add benchmark dependency to cmake build	2024-06-12 07:09:15 +00:00
Ray Smith	bdf33c7008	Updated benchmarks.cc to recent changes to Gemma API. PiperOrigin-RevId: 642285902	2024-06-11 08:55:40 -07:00
Phil Culliton	b6565e3bf6	Update AssertClose for large matrices and add large matrix test PiperOrigin-RevId: 642277221	2024-06-11 08:22:47 -07:00
Daniel Keysers	8ec8eef524	Add internal initialization code to debug_prompt. PiperOrigin-RevId: 642276350	2024-06-11 08:19:38 -07:00
The gemma.cpp Authors	57d0ea95d0	Add buildcleaner: keep pragma to a dep in ops_test build rule and run build_cleaner. PiperOrigin-RevId: 642275845	2024-06-11 08:17:47 -07:00
Jan Wassenberg	3e2396f98c	Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc accept_token: allow default, check if empty when using allow mixing sample_func and stream_func, call the latter after the former Also fix missing includes/deps. PiperOrigin-RevId: 642240012	2024-06-11 05:53:10 -07:00
Jan Wassenberg	a0e808e341	Add compression/ comments, especially on SFP range PiperOrigin-RevId: 642238720	2024-06-11 05:47:49 -07:00
Daniel Keysers	c557ad23a8	Adds simple-loop versions of missing batched functions. PiperOrigin-RevId: 642189741	2024-06-11 02:14:02 -07:00
Jan Wassenberg	c7f5e93136	Update benchmark with internal init PiperOrigin-RevId: 641929308	2024-06-10 09:35:16 -07:00
Copybara-Service	49d814b519	Merge pull request #224 from szabadka:cleanup PiperOrigin-RevId: 641922102	2024-06-10 09:11:13 -07:00
Jan Wassenberg	c1c6714ad4	Internal experiment PiperOrigin-RevId: 641915024	2024-06-10 08:46:10 -07:00
Zoltan Szabadka	6ca4a8e345	Address review comments	2024-06-10 15:27:22 +00:00
Zoltan Szabadka	a3a75b77f9	Use CompressedWeights<TConfig<float>> in backpropagation. kWeightsAreCompressed are removed and LoadRawWeights is moved to compress_weights.cc	2024-06-10 14:34:24 +00:00
Jan Wassenberg	95fd7263ae	Add missing test deps PiperOrigin-RevId: 641880024	2024-06-10 06:22:07 -07:00
Phil Culliton	c5bcb5438c	Fix for transpose matrix creation and additional tests PiperOrigin-RevId: 641868053	2024-06-10 05:24:04 -07:00
Jan Wassenberg	36e6915e18	Add CPU output, error if not C++17, simplify tokenizer ctor PiperOrigin-RevId: 641850879	2024-06-10 04:01:11 -07:00
The gemma.cpp Authors	020db5a67d	No public description PiperOrigin-RevId: 641816837	2024-06-10 01:12:42 -07:00
Phil Culliton	d985d8b867	Shifting large matrix init to heap in ops_test.cc PiperOrigin-RevId: 641311100	2024-06-07 11:38:42 -07:00
Jan Wassenberg	f9b390b134	Support all weight types in a single binary. This changes the command line flags, but the default value retains the previous behavior. Also add a CreateGemma helper to enable extra args without interface changes. PiperOrigin-RevId: 641266411	2024-06-07 09:04:45 -07:00
Copybara-Service	24db2ff725	Merge pull request #217 from szabadka:cross-entropy PiperOrigin-RevId: 641241133	2024-06-07 07:17:35 -07:00
Daniel Keysers	06f814fc8b	Small code cleanup suggestions while reading the code. PiperOrigin-RevId: 641220788	2024-06-07 05:33:17 -07:00

1 2 3 4 5 ...

434 Commits All Branches Search

434 Commits

All Branches