gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Jan Wassenberg	8e028632f7	0.98x prefill: refactor in prep for cache blocking. Slower because we now init tiles of C and accumulate into them. Also remove unused var in optimize_test and use BF16 typedef. PiperOrigin-RevId: 662115916	2024-08-12 09:26:29 -07:00
Jan Wassenberg	1617e1a33d	SFP speedup: 1.14x f32, 1.19x bf16 dot = 1.02x prefill 12->9 ops by recognizing the upper/lower bytes are simply shifted. PiperOrigin-RevId: 659609241	2024-08-05 10:59:13 -07:00
Jan Wassenberg	6ea4232b2e	MatMul cleanup: Mat struct, simplify args. Add large benchmark to test, use 4 threads, skip some targets. Also use Traits::Name instead of typeid. PiperOrigin-RevId: 657496185	2024-07-30 01:55:50 -07:00
Thomas Fischbacher	d9f86f8e4d	Add Python code for converting Griffin Orbax weights. Refs #301 PiperOrigin-RevId: 657296255	2024-07-29 12:53:30 -07:00
The gemma.cpp Authors	c1f243c351	Fix setting scales in Py binding PiperOrigin-RevId: 655284183	2024-07-23 13:32:50 -07:00
Daniel Keysers	e87e65ca45	Add scale parameter to MatMul. Add accessor to CompressedArray that asserts the scale is 1 and use it. PiperOrigin-RevId: 653604840	2024-07-18 06:58:56 -07:00
Daniel Keysers	ff34370aac	Simplify FFW by using MatMul_4x4_Batch_Add. Affects only the griffin model, where prefill TPS improves by about 70%. PiperOrigin-RevId: 652878176	2024-07-16 09:41:23 -07:00
Andrey Vlasov	3e92088595	Remove allocation from GEMM_4x4_Tile when decoding compressed weights by implementing SfpCodec::Dec2F and ComressTraits<T>::Decompress2 for all supported types. It also allows to remove one of the specializations of GEMM_4x4_Tile, handling compressed MatB with one function. As before even when MatA is bf16 it is using 32-bit registers for computations. Measurements for a 2b-it sfp-encoded model on a AMD Ryzen Threadripper PRO 3945WX 12-Cores: baseline: ``` 32.6254 prefill tokens / sec 8.91429 tokens / sec 115 milliseconds time to first token ``` this change: ``` 54.3045 prefill tokens / sec 16.8191 tokens / sec 56 milliseconds time to first token ``` PiperOrigin-RevId: 651369694	2024-07-11 05:13:39 -07:00
Kan Wu	f519ab6693	Refactor configurables. PiperOrigin-RevId: 651259154	2024-07-10 21:30:58 -07:00
Jan Wassenberg	cbb67b4ee0	Move benchmark_helper to evals/, weights_raw to compression/. PiperOrigin-RevId: 650155983	2024-07-08 01:13:23 -07:00
Jan Wassenberg	f823371691	Cleanup: move util/compress and convert_weights to compression/ Also remove unused models/, lint convert_weights PiperOrigin-RevId: 649613088	2024-07-05 04:16:52 -07:00
Jan Wassenberg	41efec4dba	Add Py bindings for weight compression TODO: this uses clif instead of pybind11, and depends on absl. PiperOrigin-RevId: 649575815	2024-07-05 01:06:00 -07:00
Jan Wassenberg	d3c6a45b59	Major duplicated code reduction in test/benchmarks Helper functions to tokenize/wrap Move LayersOutputFunc into RuntimeConfig AcceptFunc passes the probability Implement StringFromType using the parser, and verify results match PiperOrigin-RevId: 643255119	2024-06-14 00:16:25 -07:00
Jan Wassenberg	a0e808e341	Add compression/ comments, especially on SFP range PiperOrigin-RevId: 642238720	2024-06-11 05:47:49 -07:00
Jan Wassenberg	5c3e5f7038	Remove no longer required stats.h - use Highway version instead PiperOrigin-RevId: 640440379	2024-06-05 01:37:48 -07:00
Paul Chang	175e389c3c	revert back to HWY_ASSERT for lane constraints, qualify hn::Add PiperOrigin-RevId: 640193239	2024-06-04 10:10:18 -07:00
Jan Wassenberg	4f9155d8c6	Add bf16 matmul support, update naming+test Avoid int32, which can easily overflow for large matrices. Also fix IDE warning in sfp-inl. PiperOrigin-RevId: 640149845	2024-06-04 07:41:46 -07:00
Zoltan Szabadka	36e4d8bbfe	Add first version of backpropagation support. This is still in progress / experimental, currently it is only implemented for normal gemma MQA attention layers, and no parallelism is added yet for backward pass. Since we need to remember all activations from all layers, the forward pass was also reimplemented with a new activation data structure.	2024-06-04 08:37:49 +00:00
Jan Wassenberg	a44cbdadc2	Update to Highway 1.2 for topology/VQSelect Also fix unused-warning in compress-inl. PiperOrigin-RevId: 639116915	2024-05-31 12:29:10 -07:00
Paul Chang	c0643577c3	Minor internal refactoring. PiperOrigin-RevId: 635852078	2024-05-21 10:29:59 -07:00
Paul Chang	cfce314715	Make BlobWriter::Add() accept const void* PiperOrigin-RevId: 634780483	2024-05-17 08:11:06 -07:00
Jan Wassenberg	22fe9809ac	Fix SVE build: add missing hn:: PiperOrigin-RevId: 632481097	2024-05-10 06:49:26 -07:00
Jan Wassenberg	c5c9fc300c	Enable even/odd for SFP. Refs #166 Disable it for float32 because there is not enough benefit. PiperOrigin-RevId: 631788326	2024-05-08 07:09:06 -07:00
Jan Wassenberg	f6d02b2870	Fix RecurrentGemma (refs #166 ) - one Dot was ignoring scale. Remove extra Dot() overload MatVecAdd always adds, use MatVecT<kAdd> if conditional. Remove ununsed MatVecAddLoop and MatVecLoop No longer tsan-verify even_odd PiperOrigin-RevId: 631377279	2024-05-07 04:40:42 -07:00
Jan Wassenberg	b5a9ade75f	2x speedup of SFP decode (1.4x overall) on AVX3_DL+. Thanks @nzmichaelh for suggesting table lookups! PiperOrigin-RevId: 631337524	2024-05-07 01:46:43 -07:00
Zoltan Szabadka	429eb78512	Remove unused vars.	2024-05-03 13:37:17 +00:00
Sam Kaufman	f608337fef	Remove Bf16ToF32EO and use PromoteEvenTo and PromoteOddTo.	2024-04-29 14:13:07 -07:00
Sam Kaufman	5cb63346aa	supports_eo -> kSupportsEvenOdd	2024-04-29 12:51:35 -07:00
Sam Kaufman	0816a1070d	Even-odd layout MatVecs for bf16 weights.	2024-04-28 20:09:25 -07:00
Paul Chang	e8f59bb411	Fix underflow in NUQ ClusterCost() PiperOrigin-RevId: 628137904	2024-04-25 11:28:51 -07:00
Jan Wassenberg	e9a0caed87	Further improve IO, enable multiple backends without -D. Move Path into io.h and use for opening files. Removes dependency of gemma_lib on args. Separate Windows codepath instead of emulating POSIX functions. Plus lint fixes. PiperOrigin-RevId: 626279004	2024-04-19 00:40:29 -07:00
Jan Wassenberg	a8ceb75f43	Improved IO abstraction layer Move to unique_ptr-like File class. Move `if OS_WIN` into wrapper functions. exists -> Exists. PiperOrigin-RevId: 625923056	2024-04-17 23:15:07 -07:00
Jan Wassenberg	a939b5fc9f	Update distortion.h to weighted average, add distortion_test. More thorough checks in sfp_test and nuq_test. nuq_test: use deterministic input generator. PiperOrigin-RevId: 625602019	2024-04-17 01:44:19 -07:00
Jan Wassenberg	a982ec1287	Move code to gemma/ so we can remove error-prone copybara: comments. Also fix includes and Lint warnings. PiperOrigin-RevId: 623127487	2024-04-09 04:45:42 -07:00
Luca Versari	4c23932289	Improve weight handling. - Allow scaling of SFP weights - Allow using uncompressed weights - Do not try to compress weights in the main model calls - Reduce code duplication in weight handling with some macros Co-authored-by: Eugene Kliuchnikov <eustas@google.com> Co-authored-by: Thomas Fischbacher <tfish@google.com> Co-authored-by: Zoltan Szabadka <szabadka@google.com>	2024-04-06 11:08:47 +02:00
Jan Wassenberg	7122afed5a	Add note on weight update and improve error message PiperOrigin-RevId: 621849989	2024-04-04 07:17:27 -07:00
Jan Wassenberg	61e031fe98	Towards building tests without GUnit Refs #29 PiperOrigin-RevId: 618032987	2024-03-21 19:28:02 -07:00
Jan Wassenberg	24add61dd9	Fix SFP/NUQ for bf16 rounding in Highway SFP: Avoid rounding twice, and more robust TestDot. NUQ: also more robust SNR, minor touchups to header. PiperOrigin-RevId: 618030096	2024-03-21 19:06:19 -07:00
Jan Wassenberg	ba86c8d590	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-21 04:19:02 +01:00
Eric Ye	89be4c3de8	No public description PiperOrigin-RevId: 617315030	2024-03-21 04:18:36 +01:00
Jan Wassenberg	30b8a3c1ac	Fix build for RPi, missing hn::. Refs #112 , thanks long568 PiperOrigin-RevId: 617704418	2024-03-20 20:07:49 -07:00
Jan Wassenberg	06cea2bcdb	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-20 23:37:39 +01:00
Eric Ye	ffd02c59ad	No public description PiperOrigin-RevId: 617315030	2024-03-20 23:37:12 +01:00
Jan Wassenberg	7d5364bb80	Remove obsolete copybara tags, faster bazel builds (debug) PiperOrigin-RevId: 617576799	2024-03-20 11:31:59 -07:00
Jan Wassenberg	fce5c8c967	Avoid fadvise on older Android. Fixes #84 PiperOrigin-RevId: 613815953	2024-03-07 22:19:22 -08:00
Jan Wassenberg	bb9b023502	Support Bazel builds. Fixes #16 Also fix nuq/sfp-inl: warning, cast, and disable SCALAR PiperOrigin-RevId: 612704056	2024-03-04 22:07:25 -08:00
Copybara-Service	cd7468199c	Merge pull request #65 from enum-class:narrowing-issues PiperOrigin-RevId: 612279564	2024-03-03 18:51:59 -08:00
Jan Wassenberg	b6aaf6bbb8	Fix for Android's 32-bit off_t. Fixes #62 PiperOrigin-RevId: 611249534	2024-02-28 15:30:19 -08:00
Jan Wassenberg	272f17ddb3	Warning fixes: unused member, cast, unused function PiperOrigin-RevId: 611074887	2024-02-28 05:54:22 -08:00
enum-class	06dd013397	Add clang-tidy, fix narrowing issues, fix constness	2024-02-28 20:04:09 +08:00
Jan Wassenberg	b3fecef45d	Warning fix: sign cast PiperOrigin-RevId: 610635789	2024-02-26 22:31:39 -08:00
Dan Zheng	4c155bd3df	Restore reverted changes. Sync to `84444c93a4`. PiperOrigin-RevId: 610263918	2024-02-25 19:32:07 -08:00
Silvio Traversaro	696597383c	Copybara import of the project: -- `19694e1f2e` by Silvio Traversaro <silvio@traversaro.it>: Do not pass explicitly -O2 flag to compiler in Release build COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gemma.cpp/pull/3 from traversaro:patch-1 `19694e1f2e` PiperOrigin-RevId: 610096914	2024-02-24 20:41:33 -08:00
Dan Zheng	84444c93a4	Revert "Copybara configuration update." This reverts commit `c03b5da542`. Restore lost changes due to improper Copybara syncing.	2024-02-24 15:15:14 -08:00
Dan Zheng	c03b5da542	Copybara configuration update. PiperOrigin-RevId: 609931218	2024-02-24 12:02:47 -08:00
Austin Huang	34b22c56f5	Merge pull request #6 from dcoles/clang-cl Allow building on Windows using `clang-cl` toolchain	2024-02-24 12:27:40 -05:00
Ikko Eltociear Ashimine	e4e02a17d4	Copybara import of the project: -- `5c7dbc6599` by Ikko Eltociear Ashimine <eltociear@gmail.com>: Update build.yml dispath -> dispatch COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gemma.cpp/pull/22 from eltociear:patch-1 `5c7dbc6599` PiperOrigin-RevId: 609827161	2024-02-23 22:32:51 -08:00
David Coles	39e385782c	Allow building on Windows using `clang-cl` toolchain It's not possible to build `gemma.cpp` with the standard MSVC front-end as it doesn't support arrays more than `0x7ffffffff` bytes (see Compiler Error C2148), however this isn't a problem with the optional Visual Studio Clang/LLVM frontend. This can be specified using the `-T` flag when running CMake: ``` $ cmake -B build -T ClangCL $ cmake --build build --config Release ``` Windows doesn't provide `pread`/`pwrite` so this must be emulated using the `ReadFile`/`WriteFile` Win32 APIs. `_CRT_SECURE_NO_WARNINGS` is defined to prevent a large number of warnings about using "depricated" function names (e.g. `close` instead of `_close`). `NOMINMAX` is defined to prevent the `min`/`max` macros from `windows.h` from conflicting with expressions like `std::min`. Generally libraries should avoid including `windows.h` in their public headers or define `WIN32_LEAN_AND_MEAN` before including the `windows.h` header, but this unfortunately isn't always the case.	2024-02-23 00:38:54 -08:00
The gemma_cpp Authors	587e80f276	Code update PiperOrigin-RevId: 609394329	2024-02-22 09:19:47 -08:00
Austin Huang	e29cd566cf	initial commit	2024-02-21 03:31:22 +00:00

1 2 3 4

160 Commits