gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Ray Smith	85958f5fd3	Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray. Definition of array size is moved to the constructor. Allocation is separate and parallelized. All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted. Replaced all previous ForEachTensor functions with a single unified function. PiperOrigin-RevId: 684451604	2024-10-10 08:22:30 -07:00
Jan Wassenberg	2c28b18eb0	Add NestedPools: one per socket/cluster Use in dot_test app.h: add new flags and rename num_threads to max_threads matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases PiperOrigin-RevId: 683216386	2024-10-07 09:40:19 -07:00
Daniel Keysers	f8835fe4a4	Add support for PaliGemma Vision-LM (224x224) to gemma.cpp See https://arxiv.org/abs/2407.07726 for a description of the model. Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that. PiperOrigin-RevId: 677841119	2024-09-23 10:09:38 -07:00
Jan Wassenberg	8c0a8834c1	Major compression update, arbitrary-len unpack + new Dot Compression: * Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad * New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test * Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking * NUQ: support arbitrary-length enc/dec * New compression/shared, remove sfp.h and nuq.h * Move Store2 into Traits and provide Compress2 wrapper * Remove unused Decompress()-with-pool overload * Simplify CompressedArrayLen, rename to CompressedArrayElements * Remove unused DistortionStats b_l1_ Misc: * Add compensated and Kahan dot, support any length * Use same Dot function everywhere * Move exact arithmetic functions into fp_arith * use FloatPtr and MatPtr typedefs in tests; less stack usage * Rename args to packed/raw * Remove Traits::Name, instead TypeName<T>() * Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream PiperOrigin-RevId: 672868468	2024-09-10 02:22:19 -07:00
Jan Wassenberg	c29e9752c7	Refactor/cleanup, remove even_odd * New compression/shared.h, remove sfp.h * Remove unused DistortionStats b_l1_ * Move exact arithmetic functions into fp_arith * Remove even_odd optimization for MatVec (mostly unused) * use BF16 typedef more widely * Add kMaxSFP constant PiperOrigin-RevId: 670996386	2024-09-04 09:25:13 -07:00
Jan Wassenberg	4033ed9e78	Avoid duplication of RMSNorm, support all activation/weight types Add test for RMSNorm Rename VectorizedRopeAndMulBy -> RopeAndMulBy Move test_util to util/ PiperOrigin-RevId: 668332927	2024-08-28 01:26:55 -07:00
Jan Wassenberg	2308514e5a	Experiment with compensated dot product. ULP difference vs exact is 0..1, vs 200-5000 for previous. Runtime overhead is 2.5-4x for f32 input. PiperOrigin-RevId: 668084019	2024-08-27 12:05:35 -07:00
Jan Wassenberg	301dc8067a	Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul Supports converting all weight/activation formats to native MulT (bf16/f32) Also: - ConstMat/MutableMat for const correctness - Move RowVectorBatch to allocator.h so it can be used from Matmul - Add matmul.h so MatMulEnv can be used from Activations - Remove kMaxThreads, detect from PerClusterPools - Build fix: -inl.h files must be textual_hdrs, and highway.h should precede -inl.h ``` zen4 new 64, 24576, 3072, add=0, MatTA=bf16, MatTB=sfp: 616.6 GFLOPS. 64, 3072, 24576, add=0, MatTA=bf16, MatTB=sfp: 460.7 GFLOPS. 64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 598.6 GFLOPS. 64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 435.6 GFLOPS. zen4 old 64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 257.5 GFLOPS. 64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 231.9 GFLOPS. ``` PiperOrigin-RevId: 663729812	2024-08-16 07:52:20 -07:00
RangerUFO	ea72575e56	Fix build issues when tests are enabled	2024-08-12 18:50:23 +02:00
Phil Culliton	1982a6ba00	Internal change PiperOrigin-RevId: 657831926	2024-07-30 20:24:54 -07:00
Jan Wassenberg	a24eda8d02	Split matmul into matvec; add large matrix benchmark Rename var names to row/col for more clarity. Better estimate error tolerance via max abs col sum. PiperOrigin-RevId: 657601791	2024-07-30 08:29:11 -07:00
Jan Wassenberg	85cac13fb1	Split up ops.h into ops/ops-inl and matmul-inl PiperOrigin-RevId: 654068303	2024-07-19 11:21:48 -07:00
Jan Wassenberg	cbb67b4ee0	Move benchmark_helper to evals/, weights_raw to compression/. PiperOrigin-RevId: 650155983	2024-07-08 01:13:23 -07:00
Jan Wassenberg	f823371691	Cleanup: move util/compress and convert_weights to compression/ Also remove unused models/, lint convert_weights PiperOrigin-RevId: 649613088	2024-07-05 04:16:52 -07:00
Jan Wassenberg	c7c3daa624	7x compile time speedup: shard gemma.cc Use overloaded functions defined in gemma/instantiations. Also split out activations.h. PiperOrigin-RevId: 649053122	2024-07-03 06:35:04 -07:00
Jan Wassenberg	09a7e75ead	Prep for sharding gemma.cc: split into kv_cache, tokenizer. Move activations.h to backprop/ to make space for another activations.h. PiperOrigin-RevId: 648744500	2024-07-02 09:31:06 -07:00
Jan Wassenberg	af8eb2fde3	Declutter gemma/ directory, move binaries to evals/ and util/. PiperOrigin-RevId: 648400795	2024-07-01 09:51:04 -07:00
Jan Wassenberg	7d0720675f	Move raw_weights into separate header, used mainly by compress_weights. Fix warnings in backprop/* (include) PiperOrigin-RevId: 643983136	2024-06-17 06:17:02 -07:00
Zoltan Szabadka	d98523187c	Add benchmark dependency to cmake build.	2024-06-12 08:14:29 +00:00
Zoltan Szabadka	9c869c4655	Revert "Add benchmark dependency to cmake build" This reverts commit `12ce91a163`. Reason: accidentally pushed directly to dev branch, will redo with a PR and copybara-import.	2024-06-12 07:56:03 +00:00
Zoltan Szabadka	12ce91a163	Add benchmark dependency to cmake build	2024-06-12 07:09:15 +00:00
Ray Smith	bdf33c7008	Updated benchmarks.cc to recent changes to Gemma API. PiperOrigin-RevId: 642285902	2024-06-11 08:55:40 -07:00
Jan Wassenberg	f9b390b134	Support all weight types in a single binary. This changes the command line flags, but the default value retains the previous behavior. Also add a CreateGemma helper to enable extra args without interface changes. PiperOrigin-RevId: 641266411	2024-06-07 09:04:45 -07:00
Zoltan Szabadka	465998d25a	Add support for custom sampling function to runtime config. With this addition the ComputeCrossEntropy function can be moved to its own library, because now we can compute it using only the public API functions from gemma.h	2024-06-07 11:45:07 +00:00
Zoltan Szabadka	c004799cdc	Add Adam optimizer. Drive-by: Fix compilation errors and tests for backprop functions.	2024-06-06 18:41:36 +00:00
Jan Wassenberg	57c2cd8b52	Simplifications: remove GemmaInterface and GemmaImpl Split common and weights into separate lib Remove common-inl (does not have to be SIMD code), activations.cc Centralize switch(Model) to avoid duplication Move CompressWeightsT to compress_weights.cc Move LoadWeights to weights.cc PiperOrigin-RevId: 640869202	2024-06-06 05:54:21 -07:00
Zoltan Szabadka	df01700b54	Move the backpropagation code to its own directory	2024-06-04 10:20:16 +00:00
Zoltan Szabadka	36e4d8bbfe	Add first version of backpropagation support. This is still in progress / experimental, currently it is only implemented for normal gemma MQA attention layers, and no parallelism is added yet for backward pass. Since we need to remember all activations from all layers, the forward pass was also reimplemented with a new activation data structure.	2024-06-04 08:37:49 +00:00
Jan Wassenberg	a44cbdadc2	Update to Highway 1.2 for topology/VQSelect Also fix unused-warning in compress-inl. PiperOrigin-RevId: 639116915	2024-05-31 12:29:10 -07:00
Wang Xinping	2c038e1285	work with cmake install	2024-05-03 23:44:12 +08:00
Jan Wassenberg	e9a0caed87	Further improve IO, enable multiple backends without -D. Move Path into io.h and use for opening files. Removes dependency of gemma_lib on args. Separate Windows codepath instead of emulating POSIX functions. Plus lint fixes. PiperOrigin-RevId: 626279004	2024-04-19 00:40:29 -07:00
Jan Wassenberg	a8ceb75f43	Improved IO abstraction layer Move to unique_ptr-like File class. Move `if OS_WIN` into wrapper functions. exists -> Exists. PiperOrigin-RevId: 625923056	2024-04-17 23:15:07 -07:00
Andrey Mikhaylov	03284d752e	Added layers output functionality to gemma and a binary debug_output to save the outputs to a json file.	2024-04-12 15:39:16 +00:00
Jan Wassenberg	a982ec1287	Move code to gemma/ so we can remove error-prone copybara: comments. Also fix includes and Lint warnings. PiperOrigin-RevId: 623127487	2024-04-09 04:45:42 -07:00
Luca Versari	5862d1f995	Add a benchmark and additional tests. Also add a script to help running sanitizer builds, and do some cleanup. Co-authored-by: Andrey Mikhaylov <amik@google.com> Co-authored-by: Eugene Kliuchnikov <eustas@google.com> Co-authored-by: Sami Boukortt <sboukortt@google.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-06 12:54:52 +02:00
Luca Versari	6cdb8a45a0	Add more ops: Sigmoid, (Two)MatVecAdd. Faster TwoMatVec. drive-by: some build system simplifications Co-authored-by: Andrey Mikhaylov <amik@google.com> Co-authored-by: Lode Vandevenne <lode@google.com> Co-authored-by: Martin Bruse <zondolfin@gmail.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-05 12:27:31 +02:00
Zoltan Szabadka	b670d43e4f	Add standalone tool to compress weights. Co-authored-by: Eugene Kliuchnikov <eustas@google.com>	2024-04-03 14:54:08 +00:00
enum-class	aa6e88e591	add unit tests for ops	2024-03-23 21:09:19 +08:00
Jan Wassenberg	24add61dd9	Fix SFP/NUQ for bf16 rounding in Highway SFP: Avoid rounding twice, and more robust TestDot. NUQ: also more robust SNR, minor touchups to header. PiperOrigin-RevId: 618030096	2024-03-21 19:06:19 -07:00
enum-class	06dd013397	Add clang-tidy, fix narrowing issues, fix constness	2024-02-28 20:04:09 +08:00
Kewde	4e2efbcbd8	Copybara import of the project: -- `f4f2ff3c1a` by kewde <kewde@particl.io>: fix: add -fPIC to libgemma COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gemma.cpp/pull/42 from kewde:kewde/enable-fpic `f4f2ff3c1a` PiperOrigin-RevId: 610416597	2024-02-26 08:31:06 -08:00
Dan Zheng	4c155bd3df	Restore reverted changes. Sync to `84444c93a4`. PiperOrigin-RevId: 610263918	2024-02-25 19:32:07 -08:00
Silvio Traversaro	696597383c	Copybara import of the project: -- `19694e1f2e` by Silvio Traversaro <silvio@traversaro.it>: Do not pass explicitly -O2 flag to compiler in Release build COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gemma.cpp/pull/3 from traversaro:patch-1 `19694e1f2e` PiperOrigin-RevId: 610096914	2024-02-24 20:41:33 -08:00
Dan Zheng	84444c93a4	Revert "Copybara configuration update." This reverts commit `c03b5da542`. Restore lost changes due to improper Copybara syncing.	2024-02-24 15:15:14 -08:00
Dan Zheng	c03b5da542	Copybara configuration update. PiperOrigin-RevId: 609931218	2024-02-24 12:02:47 -08:00
Austin Huang	830dda09b0	Merge pull request #3 from traversaro/patch-1 Do not pass explicitly -O2 flag to compiler in Release build	2024-02-24 12:28:03 -05:00
Austin Huang	34b22c56f5	Merge pull request #6 from dcoles/clang-cl Allow building on Windows using `clang-cl` toolchain	2024-02-24 12:27:40 -05:00
Yuta Hayashibe	65b20f9c55	Fix typos	2024-02-24 19:15:52 +09:00
David Coles	39e385782c	Allow building on Windows using `clang-cl` toolchain It's not possible to build `gemma.cpp` with the standard MSVC front-end as it doesn't support arrays more than `0x7ffffffff` bytes (see Compiler Error C2148), however this isn't a problem with the optional Visual Studio Clang/LLVM frontend. This can be specified using the `-T` flag when running CMake: ``` $ cmake -B build -T ClangCL $ cmake --build build --config Release ``` Windows doesn't provide `pread`/`pwrite` so this must be emulated using the `ReadFile`/`WriteFile` Win32 APIs. `_CRT_SECURE_NO_WARNINGS` is defined to prevent a large number of warnings about using "depricated" function names (e.g. `close` instead of `_close`). `NOMINMAX` is defined to prevent the `min`/`max` macros from `windows.h` from conflicting with expressions like `std::min`. Generally libraries should avoid including `windows.h` in their public headers or define `WIN32_LEAN_AND_MEAN` before including the `windows.h` header, but this unfortunately isn't always the case.	2024-02-23 00:38:54 -08:00
Silvio Traversaro	19694e1f2e	Do not pass explicitly -O2 flag to compiler in Release build	2024-02-21 21:02:48 +01:00

1 2

51 Commits