gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Jan Wassenberg	2c28b18eb0	Add NestedPools: one per socket/cluster Use in dot_test app.h: add new flags and rename num_threads to max_threads matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases PiperOrigin-RevId: 683216386	2024-10-07 09:40:19 -07:00
Jan Wassenberg	bd53b0f7c3	Fix MSAN issue for multiturn. Rewind the prior EOS token. Also move MaybeCheckInitialized to allocator.h PiperOrigin-RevId: 683187458	2024-10-07 08:07:54 -07:00
Jan Wassenberg	2d14d796e3	1.09x decode speedup for topk=1/temp0: fuse softmax and sample PiperOrigin-RevId: 680589099	2024-09-30 08:37:41 -07:00
Daniel Keysers	f8835fe4a4	Add support for PaliGemma Vision-LM (224x224) to gemma.cpp See https://arxiv.org/abs/2407.07726 for a description of the model. Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that. PiperOrigin-RevId: 677841119	2024-09-23 10:09:38 -07:00
RangerUFO	62be3b98ce	Fix the warnings complained by Clang	2024-09-19 13:57:24 +08:00
Daniel Keysers	e4ba93412a	Add const batch accessor to RowVectorBatch. PiperOrigin-RevId: 675530484	2024-09-17 05:42:14 -07:00
Jan Wassenberg	8c0a8834c1	Major compression update, arbitrary-len unpack + new Dot Compression: * Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad * New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test * Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking * NUQ: support arbitrary-length enc/dec * New compression/shared, remove sfp.h and nuq.h * Move Store2 into Traits and provide Compress2 wrapper * Remove unused Decompress()-with-pool overload * Simplify CompressedArrayLen, rename to CompressedArrayElements * Remove unused DistortionStats b_l1_ Misc: * Add compensated and Kahan dot, support any length * Use same Dot function everywhere * Move exact arithmetic functions into fp_arith * use FloatPtr and MatPtr typedefs in tests; less stack usage * Rename args to packed/raw * Remove Traits::Name, instead TypeName<T>() * Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream PiperOrigin-RevId: 672868468	2024-09-10 02:22:19 -07:00
Jan Wassenberg	5c0da8c8c3	Minor cleanup/fixes: - optimize_test simplify prompt check - Fix SFP arg case - Fix includes - Align inputs in test - IsInside: add DASSERT - Fix PerClusterPool NumThreads PiperOrigin-RevId: 672530385	2024-09-09 06:58:09 -07:00
Daniel Keysers	a8e08778d4	Add an additional QueryModel() overload to GemmaEnv. Use args only in GemmaEnv constructor, store everything else in RuntimeConfig. Add runtime option to turn off thread spinning. PiperOrigin-RevId: 670467320	2024-09-03 02:25:19 -07:00
Jan Wassenberg	4033ed9e78	Avoid duplication of RMSNorm, support all activation/weight types Add test for RMSNorm Rename VectorizedRopeAndMulBy -> RopeAndMulBy Move test_util to util/ PiperOrigin-RevId: 668332927	2024-08-28 01:26:55 -07:00
Jan Wassenberg	301dc8067a	Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul Supports converting all weight/activation formats to native MulT (bf16/f32) Also: - ConstMat/MutableMat for const correctness - Move RowVectorBatch to allocator.h so it can be used from Matmul - Add matmul.h so MatMulEnv can be used from Activations - Remove kMaxThreads, detect from PerClusterPools - Build fix: -inl.h files must be textual_hdrs, and highway.h should precede -inl.h ``` zen4 new 64, 24576, 3072, add=0, MatTA=bf16, MatTB=sfp: 616.6 GFLOPS. 64, 3072, 24576, add=0, MatTA=bf16, MatTB=sfp: 460.7 GFLOPS. 64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 598.6 GFLOPS. 64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 435.6 GFLOPS. zen4 old 64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 257.5 GFLOPS. 64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 231.9 GFLOPS. ``` PiperOrigin-RevId: 663729812	2024-08-16 07:52:20 -07:00
Jan Wassenberg	282f73ec2f	Add pin flag to disable pinning. Refs #338 PiperOrigin-RevId: 661389171	2024-08-09 13:47:12 -07:00
Jan Wassenberg	5e433e774a	1.1x prefill speedup, revamp threading in preparation for hierarchical parallelism. Limit thread counts to detected. Add max_clusters arg. Update detection logic to check for smt0 - previously we pinned to some siblings. PiperOrigin-RevId: 659755311	2024-08-05 18:50:09 -07:00
Jan Wassenberg	aaf51898b6	Major revamp #2 of Prefill: fix token order, parallel for multi-query - Allocate only the required KV caches and activation batch size - Add flags for batch sizes - Const-correct interface: Span of const int. - Also clean up the KVCache arg to a span. - Move kPrefillBatchSize into RuntimeConfig and remove related global constants. PiperOrigin-RevId: 655893197	2024-07-25 03:28:55 -07:00
The gemma.cpp Authors	74a6dc8f33	Use all CPU sockets when pinning threads to cores PiperOrigin-RevId: 654800375	2024-07-22 10:09:16 -07:00
Jan Wassenberg	ee6e017a77	Fix windows build: min conflict, unused VF PiperOrigin-RevId: 650955138	2024-07-10 04:18:25 -07:00
Jan Wassenberg	6a3f7cf3ea	Lint fix - string append, remove stale TODO PiperOrigin-RevId: 650197468	2024-07-08 04:11:21 -07:00
Jan Wassenberg	f823371691	Cleanup: move util/compress and convert_weights to compression/ Also remove unused models/, lint convert_weights PiperOrigin-RevId: 649613088	2024-07-05 04:16:52 -07:00
Jan Wassenberg	85fcd3cd80	Cleanup: add ModelInfo struct, remove gcpp:: PiperOrigin-RevId: 648707763	2024-07-02 07:11:15 -07:00
Jan Wassenberg	b1c1ec1d59	Use benchmark_helper in py bindings (adds BOS) Also remove thread clamp (OK to be zero or large). PiperOrigin-RevId: 648657155	2024-07-02 03:27:15 -07:00
Jan Wassenberg	af8eb2fde3	Declutter gemma/ directory, move binaries to evals/ and util/. PiperOrigin-RevId: 648400795	2024-07-01 09:51:04 -07:00
The gemma.cpp Authors	ef786f1bfc	Use hwy::ThreadPool::MaxThreads() to determine the number of threads to use. PiperOrigin-RevId: 646117298	2024-06-24 09:16:04 -07:00
Daniel Keysers	0570972d43	Fixing two typos. PiperOrigin-RevId: 645103198	2024-06-20 11:33:12 -07:00
Jan Wassenberg	d3c6a45b59	Major duplicated code reduction in test/benchmarks Helper functions to tokenize/wrap Move LayersOutputFunc into RuntimeConfig AcceptFunc passes the probability Implement StringFromType using the parser, and verify results match PiperOrigin-RevId: 643255119	2024-06-14 00:16:25 -07:00
Jan Wassenberg	3e2396f98c	Use Loader/AppArgs to construct gemma_test model, simplify AcceptFunc accept_token: allow default, check if empty when using allow mixing sample_func and stream_func, call the latter after the former Also fix missing includes/deps. PiperOrigin-RevId: 642240012	2024-06-11 05:53:10 -07:00
Jan Wassenberg	f9b390b134	Support all weight types in a single binary. This changes the command line flags, but the default value retains the previous behavior. Also add a CreateGemma helper to enable extra args without interface changes. PiperOrigin-RevId: 641266411	2024-06-07 09:04:45 -07:00
Zelalem Aweke	9e213b3d96	Use system topology to pin threads across clusters. PiperOrigin-RevId: 640151974	2024-06-04 07:50:32 -07:00
Jan Wassenberg	12fb2f05cf	Add per-thread even_odd storage for #166 . Also inline ProjQ and ProjKV lambdas, add missing includes/deps for ops_test. PiperOrigin-RevId: 629460608	2024-04-30 10:42:23 -07:00
Jan Wassenberg	7a12e29027	Add error-checking for py binding, add missing include+hwasan check PiperOrigin-RevId: 628453112	2024-04-26 10:59:41 -07:00
Phil Culliton	9e0ac5de34	Update Clif wrapper to work with latest gemma.cpp and add simple example PiperOrigin-RevId: 628134201	2024-04-25 11:17:16 -07:00
Jan Wassenberg	e9a0caed87	Further improve IO, enable multiple backends without -D. Move Path into io.h and use for opening files. Removes dependency of gemma_lib on args. Separate Windows codepath instead of emulating POSIX functions. Plus lint fixes. PiperOrigin-RevId: 626279004	2024-04-19 00:40:29 -07:00
Jan Wassenberg	a8ceb75f43	Improved IO abstraction layer Move to unique_ptr-like File class. Move `if OS_WIN` into wrapper functions. exists -> Exists. PiperOrigin-RevId: 625923056	2024-04-17 23:15:07 -07:00
Jan Wassenberg	a982ec1287	Move code to gemma/ so we can remove error-prone copybara: comments. Also fix includes and Lint warnings. PiperOrigin-RevId: 623127487	2024-04-09 04:45:42 -07:00
Luca Versari	9c3f969405	Implement the Griffin model. Also implement support for some model variations: - Local attention. - Add support for biases. - Use RoPE only on half vectors. - Support different order of QKV weights. Co-authored-by: Andrey Mikhaylov <amik@google.com> Co-authored-by: Martin Bruse <zondolfin@gmail.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-08 21:45:54 +02:00
Luca Versari	5862d1f995	Add a benchmark and additional tests. Also add a script to help running sanitizer builds, and do some cleanup. Co-authored-by: Andrey Mikhaylov <amik@google.com> Co-authored-by: Eugene Kliuchnikov <eustas@google.com> Co-authored-by: Sami Boukortt <sboukortt@google.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-06 12:54:52 +02:00
Luca Versari	4c23932289	Improve weight handling. - Allow scaling of SFP weights - Allow using uncompressed weights - Do not try to compress weights in the main model calls - Reduce code duplication in weight handling with some macros Co-authored-by: Eugene Kliuchnikov <eustas@google.com> Co-authored-by: Thomas Fischbacher <tfish@google.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-06 11:08:47 +02:00
Zoltan Szabadka	b670d43e4f	Add standalone tool to compress weights. Co-authored-by: Eugene Kliuchnikov <eustas@google.com>	2024-04-03 14:54:08 +00:00
Copybara-Service	bbf4df4584	Merge pull request #115 from villesundell:patch-1 PiperOrigin-RevId: 619262700	2024-03-26 11:46:54 -07:00
Copybara-Service	fcf5c1af88	Merge pull request #114 from ufownl:experimental PiperOrigin-RevId: 618148701	2024-03-22 05:36:07 -07:00
Jan Wassenberg	ba86c8d590	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-21 04:19:02 +01:00
Eric Ye	89be4c3de8	No public description PiperOrigin-RevId: 617315030	2024-03-21 04:18:36 +01:00
Ville Sundell	546519c855	Added a missing space in app.h When the user runs "--help", they see the non-existent word "compressingnew". This is because of a missing space, which is now added, resulting in "compressing new".	2024-03-21 00:39:45 +02:00
Jan Wassenberg	06cea2bcdb	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-20 23:37:39 +01:00
Eric Ye	ffd02c59ad	No public description PiperOrigin-RevId: 617315030	2024-03-20 23:37:12 +01:00
Jan Wassenberg	7d5364bb80	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-20 11:31:59 -07:00
RangerUFO	6923aec853	Add MQA support	2024-03-20 18:17:24 +08:00
RangerUFO	130e1f678f	Adjust vocab size to be the same as gemma_pytorch	2024-03-20 18:17:24 +08:00
Copybara-Service	a0f316d853	Merge pull request #95 from google:conversion PiperOrigin-RevId: 615448039	2024-03-13 09:37:36 -07:00
pculliton	f520e5c25c	Remove WIP messages.	2024-03-13 11:36:19 -04:00
Copybara-Service	0221956b2e	Merge pull request #87 from google:refactor-tidy PiperOrigin-RevId: 615204427	2024-03-12 16:10:47 -07:00
Phil Culliton	b6831a2256	Fixed 7B conversion.	2024-03-12 21:12:28 +00:00
austinvhuang	4aa8d0584e	Merge branch 'dev' into refactor-tidy	2024-03-12 15:01:46 -04:00
Copybara-Service	ccd055e06b	Merge pull request #82 from google:examples PiperOrigin-RevId: 615066980	2024-03-12 09:24:24 -07:00
Jan Wassenberg	0d406061c0	Detect and print build type. Refs #88 PiperOrigin-RevId: 614906000	2024-03-11 21:58:10 -07:00
austinvhuang	60d054e041	move arg definitions out of gemma.h to app.h	2024-03-10 23:49:25 -04:00
Phil Culliton	2161908f50	Added 7B support and args parsing. Still todo: more testing of 7B conversion.	2024-03-07 22:34:14 +00:00
austinvhuang	10f7a086aa	[WIP] decouple GemmaImpl from CLI args	2024-03-06 15:06:41 -05:00
Phil Culliton	c93e1a1e4d	Resolved layer ordering, reshaping, MQA->MHA, and quantization. Works only for 2B.	2024-03-05 17:54:55 +00:00
austinvhuang	3c69695c1e	transformations and validations (wip)	2024-03-02 14:46:51 -05:00
austinvhuang	7d7d43e661	converter transformations (wip)	2024-03-02 08:11:55 -05:00
austinvhuang	5be9a2243f	initial (wip) convert_weights script from pytorch	2024-03-01 15:52:51 -05:00
austinvhuang	0ea7b993de	remove --log fixing https://github.com/google/gemma.cpp/issues/59 , improve command line args help, add copybara #include sort guards in more source files, add README sections on running faster and related projects	2024-02-28 15:18:40 -05:00
Jan Wassenberg	272f17ddb3	Warning fixes: unused member, cast, unused function PiperOrigin-RevId: 611074887	2024-02-28 05:54:22 -08:00
Copybara-Service	1a1dd90287	Merge pull request #33 from shirayu:add_eot_option PiperOrigin-RevId: 610838070	2024-02-27 12:32:01 -08:00
Jan Wassenberg	179ecf9e78	Warn instead of assert for setaffinity. Fixes #49 PiperOrigin-RevId: 610638517	2024-02-26 22:46:11 -08:00
Dan Zheng	4c155bd3df	Restore reverted changes. Sync to `84444c93a4`. PiperOrigin-RevId: 610263918	2024-02-25 19:32:07 -08:00
Silvio Traversaro	696597383c	Copybara import of the project: -- `19694e1f2e` by Silvio Traversaro <silvio@traversaro.it>: Do not pass explicitly -O2 flag to compiler in Release build COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gemma.cpp/pull/3 from traversaro:patch-1 `19694e1f2e` PiperOrigin-RevId: 610096914	2024-02-24 20:41:33 -08:00
Dan Zheng	84444c93a4	Revert "Copybara configuration update." This reverts commit `c03b5da542`. Restore lost changes due to improper Copybara syncing.	2024-02-24 15:15:14 -08:00
Dan Zheng	c03b5da542	Copybara configuration update. PiperOrigin-RevId: 609931218	2024-02-24 12:02:47 -08:00
Yuta Hayashibe	1a95cf3274	Add --eot_line option	2024-02-24 23:27:33 +09:00
David Coles	39e385782c	Allow building on Windows using `clang-cl` toolchain It's not possible to build `gemma.cpp` with the standard MSVC front-end as it doesn't support arrays more than `0x7ffffffff` bytes (see Compiler Error C2148), however this isn't a problem with the optional Visual Studio Clang/LLVM frontend. This can be specified using the `-T` flag when running CMake: ``` $ cmake -B build -T ClangCL $ cmake --build build --config Release ``` Windows doesn't provide `pread`/`pwrite` so this must be emulated using the `ReadFile`/`WriteFile` Win32 APIs. `_CRT_SECURE_NO_WARNINGS` is defined to prevent a large number of warnings about using "depricated" function names (e.g. `close` instead of `_close`). `NOMINMAX` is defined to prevent the `min`/`max` macros from `windows.h` from conflicting with expressions like `std::min`. Generally libraries should avoid including `windows.h` in their public headers or define `WIN32_LEAN_AND_MEAN` before including the `windows.h` header, but this unfortunately isn't always the case.	2024-02-23 00:38:54 -08:00
Austin Huang	e29cd566cf	initial commit	2024-02-21 03:31:22 +00:00

1 2 3 4

172 Commits