gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Jan Wassenberg	02ce1e344f	Use NestedPools, add NUMA infra Improved threading.h, fix thread counts for single package/cluster systems Temporarily forces to a single socket. Prefill 29.28 tps, decode 6.92. Also fix benchmarks.cc build, update tensor allocator to Allocator PiperOrigin-RevId: 687307167	2024-10-18 08:11:18 -07:00
Ray Smith	0d68555f87	Eliminated TConfig. Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively. Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly. Adjusted WeightsWrapper and ForwardLayer etc to match. The only remaining template arg is the weight type. This enables all the instantiations to be deleted, apart from one per type. It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately. Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively. PiperOrigin-RevId: 686870060	2024-10-17 05:04:22 -07:00
Daniel Keysers	a4d6adbc43	Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize. Remove max_tokens (and rely on only max_generated_tokens). PiperOrigin-RevId: 685662260	2024-10-14 04:45:21 -07:00
Jan Wassenberg	6ab3ff5bde	Minor cleanup, Windows+Bazel build fixes add app.h comment compress-inl: remove unused typedef gemma-inl: add missing HWY_ATTR and cast separate sum-inl.h and basics.h headers replace more hwy::bfloat16_t with BF16 update include pragmas update dot_test thresholds update Highway version in Bazel for HWY_RCAST_ALIGNED fix PiperOrigin-RevId: 684464326	2024-10-10 09:05:06 -07:00
Jan Wassenberg	2c28b18eb0	Add NestedPools: one per socket/cluster Use in dot_test app.h: add new flags and rename num_threads to max_threads matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases PiperOrigin-RevId: 683216386	2024-10-07 09:40:19 -07:00
RangerUFO	42ab476a9a	Fix the file name conflicts on case-insensitive systems	2024-09-19 13:54:35 +08:00
Paul Chang	22d9476aad	Demonstrate constrained decoding in gemma_cpp's hello world example PiperOrigin-RevId: 669327521	2024-08-30 08:03:07 -07:00
Jan Wassenberg	22995c699d	Simplify pos handling, auto-increment output arg - no longer multiply by num_queries - remove unused interleaved prompts - Rename to Queries* - Rename batch_start/interleaved_pos/pos to queries_pos PiperOrigin-RevId: 663331823	2024-08-15 09:25:26 -07:00
Jan Wassenberg	282f73ec2f	Add pin flag to disable pinning. Refs #338 PiperOrigin-RevId: 661389171	2024-08-09 13:47:12 -07:00
Jan Wassenberg	5e433e774a	1.1x prefill speedup, revamp threading in preparation for hierarchical parallelism. Limit thread counts to detected. Add max_clusters arg. Update detection logic to check for smt0 - previously we pinned to some siblings. PiperOrigin-RevId: 659755311	2024-08-05 18:50:09 -07:00
Jan Wassenberg	aaf51898b6	Major revamp #2 of Prefill: fix token order, parallel for multi-query - Allocate only the required KV caches and activation batch size - Add flags for batch sizes - Const-correct interface: Span of const int. - Also clean up the KVCache arg to a span. - Move kPrefillBatchSize into RuntimeConfig and remove related global constants. PiperOrigin-RevId: 655893197	2024-07-25 03:28:55 -07:00
Paul Chang	48b900b1b9	Fix examples/hello_world for real. PiperOrigin-RevId: 652509319	2024-07-15 09:38:52 -07:00
Paul Chang	aaee666a1d	Fix gemma_cpp/examples/hello_world build. Include Bazel build rules, too. PiperOrigin-RevId: 652469406	2024-07-15 07:11:01 -07:00
Jan Wassenberg	3e2396f98c	Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc accept_token: allow default, check if empty when using allow mixing sample_func and stream_func, call the latter after the former Also fix missing includes/deps. PiperOrigin-RevId: 642240012	2024-06-11 05:53:10 -07:00
Jan Wassenberg	f9b390b134	Support all weight types in a single binary. This changes the command line flags, but the default value retains the previous behavior. Also add a CreateGemma helper to enable extra args without interface changes. PiperOrigin-RevId: 641266411	2024-06-07 09:04:45 -07:00
Jan Wassenberg	57c2cd8b52	Simplifications: remove GemmaInterface and GemmaImpl Split common and weights into separate lib Remove common-inl (does not have to be SIMD code), activations.cc Centralize switch(Model) to avoid duplication Move CompressWeightsT to compress_weights.cc Move LoadWeights to weights.cc PiperOrigin-RevId: 640869202	2024-06-06 05:54:21 -07:00
Jan Wassenberg	a982ec1287	Move code to gemma/ so we can remove error-prone copybara: comments. Also fix includes and Lint warnings. PiperOrigin-RevId: 623127487	2024-04-09 04:45:42 -07:00
Luca Versari	5862d1f995	Add a benchmark and additional tests. Also add a script to help running sanitizer builds, and do some cleanup. Co-authored-by: Andrey Mikhaylov <amik@google.com> Co-authored-by: Eugene Kliuchnikov <eustas@google.com> Co-authored-by: Sami Boukortt <sboukortt@google.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-06 12:54:52 +02:00
Luca Versari	4c23932289	Improve weight handling. - Allow scaling of SFP weights - Allow using uncompressed weights - Do not try to compress weights in the main model calls - Reduce code duplication in weight handling with some macros Co-authored-by: Eugene Kliuchnikov <eustas@google.com> Co-authored-by: Thomas Fischbacher <tfish@google.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-06 11:08:47 +02:00
Jan Wassenberg	ba86c8d590	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-21 04:19:02 +01:00
Eric Ye	89be4c3de8	No public description PiperOrigin-RevId: 617315030	2024-03-21 04:18:36 +01:00
Jan Wassenberg	06cea2bcdb	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-20 23:37:39 +01:00
Eric Ye	ffd02c59ad	No public description PiperOrigin-RevId: 617315030	2024-03-20 23:37:12 +01:00
Jan Wassenberg	7d5364bb80	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-20 11:31:59 -07:00
austinvhuang	810b5a0cc2	Update README with more details on contributing code, add experimental/ directory, add READMEs for subdirectories, clean up DEVELOPER notes	2024-03-15 14:10:24 -04:00
austinvhuang	72247614bb	fix prefill feedback off-by-1, update fetch commit hash	2024-03-12 15:10:44 -04:00
austinvhuang	60d054e041	move arg definitions out of gemma.h to app.h	2024-03-10 23:49:25 -04:00
austinvhuang	0fc80fad05	libgemma refactor - review changes	2024-03-10 12:55:08 -04:00
austinvhuang	cc5c24c4f8	remove app.h dependency + fix bazel build	2024-03-08 18:06:43 -05:00
austinvhuang	8c7b2cf61b	add README, license to hello_world	2024-03-08 17:59:54 -05:00
austinvhuang	571a5449c4	update commit hash for gemma lib	2024-03-08 17:33:33 -05:00
austinvhuang	03147effbd	update loader arg names: cache -> compressed_weights, model -> weights	2024-03-08 17:32:36 -05:00
austinvhuang	dfd2fdc1dd	Decouple gemma constructor from loader args, update hello_world example, add convenience version of constructor (no uncompressed weights)	2024-03-08 17:26:03 -05:00
austinvhuang	42e53e2da8	[WIP] simplify hello world example, add convenience function. TODO: update git hash in CMakeLists.txt of hello world after push	2024-03-08 14:56:22 -05:00
austinvhuang	49e654258d	[WIP] clean up hello_world #includes and CMakeLists.txt	2024-03-07 01:04:25 -05:00
austinvhuang	e781007836	[WIP] Remove InferenceArgs from hello_world example, fix ordering of LoaderArgs validation, revert ReplGemma EOT token behavior	2024-03-06 23:21:13 -05:00
austinvhuang	7042316013	[WIP] update GemmaInterface, Gemma, and Generate input parameter specs to remove InferenceArgs. TODO: update hello_world example after git commit hash is available for fetching	2024-03-06 22:22:59 -05:00
austinvhuang	10f7a086aa	[WIP] decouple GemmaImpl from CLI args	2024-03-06 15:06:41 -05:00
austinvhuang	c378ac2c56	[WIP] hello world example working. TODO: refactor interfaces to decouple arguments	2024-03-03 11:36:48 -05:00
austinvhuang	39cd59caec	[WIP] create skeleton for example frontend application	2024-03-03 10:33:29 -05:00

40 Commits