happyz/gemma.cpp - HappyGit

Author	SHA1	Message	Date
Jan Wassenberg	d3c6a45b59	Major duplicated code reduction in test/benchmarks Helper functions to tokenize/wrap Move LayersOutputFunc into RuntimeConfig AcceptFunc passes the probability Implement StringFromType using the parser, and verify results match PiperOrigin-RevId: 643255119	2024-06-14 00:16:25 -07:00
The gemma.cpp Authors	2a0e6ee976	Fix numerical issue in Softcap by subtracting max. Also update test threshold. PiperOrigin-RevId: 642587468	2024-06-12 05:42:16 -07:00
Jan Wassenberg	3e2396f98c	Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc accept_token: allow default, check if empty when using allow mixing sample_func and stream_func, call the latter after the former Also fix missing includes/deps. PiperOrigin-RevId: 642240012	2024-06-11 05:53:10 -07:00
Jan Wassenberg	f9b390b134	Support all weight types in a single binary. This changes the command line flags, but the default value retains the previous behavior. Also add a CreateGemma helper to enable extra args without interface changes. PiperOrigin-RevId: 641266411	2024-06-07 09:04:45 -07:00
Zoltan Szabadka	465998d25a	Add support for custom sampling function to runtime config. With this addition the ComputeCrossEntropy function can be moved to its own library, because now we can compute it using only the public API functions from gemma.h	2024-06-07 11:45:07 +00:00
Jan Wassenberg	57c2cd8b52	Simplifications: remove GemmaInterface and GemmaImpl Split common and weights into separate lib Remove common-inl (does not have to be SIMD code), activations.cc Centralize switch(Model) to avoid duplication Move CompressWeightsT to compress_weights.cc Move LoadWeights to weights.cc PiperOrigin-RevId: 640869202	2024-06-06 05:54:21 -07:00
Zoltan Szabadka	36e4d8bbfe	Add first version of backpropagation support. This is still in progress / experimental, currently it is only implemented for normal gemma MQA attention layers, and no parallelism is added yet for backward pass. Since we need to remember all activations from all layers, the forward pass was also reimplemented with a new activation data structure.	2024-06-04 08:37:49 +00:00
Apoorv Reddy	eb0b96e0a8	Pass most runtime parameters using const RuntimeConfig& PiperOrigin-RevId: 633572507	2024-05-14 07:04:53 -07:00
Zoltan Szabadka	27117cc39f	Simplify threading: remove the use of inner_pool. We only used inner_pool in the prefill FFW function, and there we can achieve sufficient parallelism on the rows of the matrix-vector multiplications. Benchmark results on a 1600-token summarization task: ``` Prefill speed Num threads BEFORE AFTER 4 9.24 t/s 9.76 t/s 18 31.41 t/s 31.16 t/s 32 31.41 t/s 45.13 t/s 64 31.03 t/s 57.85 t/s ```	2024-04-29 16:07:30 +00:00
Jan Wassenberg	a982ec1287	Move code to gemma/ so we can remove error-prone copybara: comments. Also fix includes and Lint warnings. PiperOrigin-RevId: 623127487	2024-04-09 04:45:42 -07:00