gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Paul Chang	22d9476aad	Demonstrate constrained decoding in gemma_cpp's hello world example PiperOrigin-RevId: 669327521	2024-08-30 08:03:07 -07:00
Jan Wassenberg	22995c699d	Simplify pos handling, auto-increment output arg - no longer multiply by num_queries - remove unused interleaved prompts - Rename to Queries* - Rename batch_start/interleaved_pos/pos to queries_pos PiperOrigin-RevId: 663331823	2024-08-15 09:25:26 -07:00
Jan Wassenberg	282f73ec2f	Add pin flag to disable pinning. Refs #338 PiperOrigin-RevId: 661389171	2024-08-09 13:47:12 -07:00
Jan Wassenberg	5e433e774a	1.1x prefill speedup, revamp threading in preparation for hierarchical parallelism. Limit thread counts to detected. Add max_clusters arg. Update detection logic to check for smt0 - previously we pinned to some siblings. PiperOrigin-RevId: 659755311	2024-08-05 18:50:09 -07:00
Jan Wassenberg	aaf51898b6	Major revamp #2 of Prefill: fix token order, parallel for multi-query - Allocate only the required KV caches and activation batch size - Add flags for batch sizes - Const-correct interface: Span of const int. - Also clean up the KVCache arg to a span. - Move kPrefillBatchSize into RuntimeConfig and remove related global constants. PiperOrigin-RevId: 655893197	2024-07-25 03:28:55 -07:00
Paul Chang	48b900b1b9	Fix examples/hello_world for real. PiperOrigin-RevId: 652509319	2024-07-15 09:38:52 -07:00
Paul Chang	aaee666a1d	Fix gemma_cpp/examples/hello_world build. Include Bazel build rules, too. PiperOrigin-RevId: 652469406	2024-07-15 07:11:01 -07:00
Jan Wassenberg	3e2396f98c	Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc accept_token: allow default, check if empty when using allow mixing sample_func and stream_func, call the latter after the former Also fix missing includes/deps. PiperOrigin-RevId: 642240012	2024-06-11 05:53:10 -07:00
Jan Wassenberg	f9b390b134	Support all weight types in a single binary. This changes the command line flags, but the default value retains the previous behavior. Also add a CreateGemma helper to enable extra args without interface changes. PiperOrigin-RevId: 641266411	2024-06-07 09:04:45 -07:00
Jan Wassenberg	57c2cd8b52	Simplifications: remove GemmaInterface and GemmaImpl Split common and weights into separate lib Remove common-inl (does not have to be SIMD code), activations.cc Centralize switch(Model) to avoid duplication Move CompressWeightsT to compress_weights.cc Move LoadWeights to weights.cc PiperOrigin-RevId: 640869202	2024-06-06 05:54:21 -07:00
Jan Wassenberg	a982ec1287	Move code to gemma/ so we can remove error-prone copybara: comments. Also fix includes and Lint warnings. PiperOrigin-RevId: 623127487	2024-04-09 04:45:42 -07:00
Luca Versari	5862d1f995	Add a benchmark and additional tests. Also add a script to help running sanitizer builds, and do some cleanup. Co-authored-by: Andrey Mikhaylov <amik@google.com> Co-authored-by: Eugene Kliuchnikov <eustas@google.com> Co-authored-by: Sami Boukortt <sboukortt@google.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-06 12:54:52 +02:00
Luca Versari	4c23932289	Improve weight handling. - Allow scaling of SFP weights - Allow using uncompressed weights - Do not try to compress weights in the main model calls - Reduce code duplication in weight handling with some macros Co-authored-by: Eugene Kliuchnikov <eustas@google.com> Co-authored-by: Thomas Fischbacher <tfish@google.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-06 11:08:47 +02:00
Jan Wassenberg	ba86c8d590	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-21 04:19:02 +01:00
Eric Ye	89be4c3de8	No public description PiperOrigin-RevId: 617315030	2024-03-21 04:18:36 +01:00
Jan Wassenberg	06cea2bcdb	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-20 23:37:39 +01:00
Eric Ye	ffd02c59ad	No public description PiperOrigin-RevId: 617315030	2024-03-20 23:37:12 +01:00
Jan Wassenberg	7d5364bb80	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-20 11:31:59 -07:00
austinvhuang	810b5a0cc2	Update README with more details on contributing code, add experimental/ directory, add READMEs for subdirectories, clean up DEVELOPER notes	2024-03-15 14:10:24 -04:00
austinvhuang	72247614bb	fix prefill feedback off-by-1, update fetch commit hash	2024-03-12 15:10:44 -04:00
austinvhuang	60d054e041	move arg definitions out of gemma.h to app.h	2024-03-10 23:49:25 -04:00
austinvhuang	0fc80fad05	libgemma refactor - review changes	2024-03-10 12:55:08 -04:00
austinvhuang	cc5c24c4f8	remove app.h dependency + fix bazel build	2024-03-08 18:06:43 -05:00
austinvhuang	8c7b2cf61b	add README, license to hello_world	2024-03-08 17:59:54 -05:00
austinvhuang	571a5449c4	update commit hash for gemma lib	2024-03-08 17:33:33 -05:00
austinvhuang	03147effbd	update loader arg names: cache -> compressed_weights, model -> weights	2024-03-08 17:32:36 -05:00
austinvhuang	dfd2fdc1dd	Decouple gemma constructor from loader args, update hello_world example, add convenience version of constructor (no uncompressed weights)	2024-03-08 17:26:03 -05:00
austinvhuang	42e53e2da8	[WIP] simplify hello world example, add convenience function. TODO: update git hash in CMakeLists.txt of hello world after push	2024-03-08 14:56:22 -05:00
austinvhuang	49e654258d	[WIP] clean up hello_world #includes and CMakeLists.txt	2024-03-07 01:04:25 -05:00
austinvhuang	e781007836	[WIP] Remove InferenceArgs from hello_world example, fix ordering of LoaderArgs validation, revert ReplGemma EOT token behavior	2024-03-06 23:21:13 -05:00
austinvhuang	7042316013	[WIP] update GemmaInterface, Gemma, and Generate input parameter specs to remove InferenceArgs. TODO: update hello_world example after git commit hash is available for fetching	2024-03-06 22:22:59 -05:00
austinvhuang	10f7a086aa	[WIP] decouple GemmaImpl from CLI args	2024-03-06 15:06:41 -05:00
austinvhuang	c378ac2c56	[WIP] hello world example working. TODO: refactor interfaces to decouple arguments	2024-03-03 11:36:48 -05:00
austinvhuang	39cd59caec	[WIP] create skeleton for example frontend application	2024-03-03 10:33:29 -05:00

34 Commits