gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Jan Wassenberg	d3c6a45b59	Major duplicated code reduction in test/benchmarks Helper functions to tokenize/wrap Move LayersOutputFunc into RuntimeConfig AcceptFunc passes the probability Implement StringFromType using the parser, and verify results match PiperOrigin-RevId: 643255119	2024-06-14 00:16:25 -07:00
Daniel Keysers	8ec8eef524	Add internal initialization code to debug_prompt. PiperOrigin-RevId: 642276350	2024-06-11 08:19:38 -07:00
Jan Wassenberg	3e2396f98c	Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc accept_token: allow default, check if empty when using allow mixing sample_func and stream_func, call the latter after the former Also fix missing includes/deps. PiperOrigin-RevId: 642240012	2024-06-11 05:53:10 -07:00
Jan Wassenberg	f9b390b134	Support all weight types in a single binary. This changes the command line flags, but the default value retains the previous behavior. Also add a CreateGemma helper to enable extra args without interface changes. PiperOrigin-RevId: 641266411	2024-06-07 09:04:45 -07:00
Jan Wassenberg	57c2cd8b52	Simplifications: remove GemmaInterface and GemmaImpl Split common and weights into separate lib Remove common-inl (does not have to be SIMD code), activations.cc Centralize switch(Model) to avoid duplication Move CompressWeightsT to compress_weights.cc Move LoadWeights to weights.cc PiperOrigin-RevId: 640869202	2024-06-06 05:54:21 -07:00
Zelalem Aweke	9e213b3d96	Use system topology to pin threads across clusters. PiperOrigin-RevId: 640151974	2024-06-04 07:50:32 -07:00
Apoorv Reddy	7f4b85d00b	Add MMLU eval to github PiperOrigin-RevId: 635495178	2024-05-20 10:20:53 -07:00
Apoorv Reddy	eb0b96e0a8	Pass most runtime parameters using const RuntimeConfig& PiperOrigin-RevId: 633572507	2024-05-14 07:04:53 -07:00
Apoorv Reddy	f1eab987d8	Store tokens/sec in auxiliary struct TimingInfo. PiperOrigin-RevId: 633108908	2024-05-13 00:04:19 -07:00
Copybara-Service	befe9fb07e	Merge pull request #167 from szabadka:gemma2 PiperOrigin-RevId: 629325219	2024-04-30 01:00:37 -07:00
Zoltan Szabadka	27117cc39f	Simplify threading: remove the use of inner_pool. We only used inner_pool in the prefill FFW function, and there we can achieve sufficient parallelism on the rows of the matrix-vector multiplications. Benchmark results on a 1600-token summarization task: ``` Prefill speed Num threads BEFORE AFTER 4 9.24 t/s 9.76 t/s 18 31.41 t/s 31.16 t/s 32 31.41 t/s 45.13 t/s 64 31.03 t/s 57.85 t/s ```	2024-04-29 16:07:30 +00:00
Andrey Mikhaylov	4ef3da733a	Fixed minor things and added comments.	2024-04-12 15:39:16 +00:00
Andrey Mikhaylov	03284d752e	Added layers output functionality to gemma and a binary debug_output to save the outputs to a json file.	2024-04-12 15:39:16 +00:00

13 Commits