gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Nanubala Gnana Sai	29e3a1bba9	Merge `51a708e957` into `5bc356f18f`	2024-12-18 12:16:32 +00:00
Daniel Keysers	62c70d6715	Rename ModelTraining to PromptWrapping which is a more accurate name. PiperOrigin-RevId: 705881500	2024-12-13 07:45:59 -08:00
Daniel Keysers	331d2ccc02	Add support for 448px resolution to PaliGemma and PaliGemma2. PiperOrigin-RevId: 704361579	2024-12-09 11:38:10 -08:00
Phil Culliton	9dfe2a76be	Internal change PiperOrigin-RevId: 702961613	2024-12-04 20:41:47 -08:00
Nanubala Gnana Sai	e8601b2415	Merge branch 'dev' into feature/ISS-60/implement-self-extend	2024-11-19 23:41:45 +05:30
Nanubala Gnana Sai	14d62b0098	Added support for mutable ModelConfig, run.cc can support runtime self extend config	2024-11-19 22:33:27 +05:30
Ray Smith	7d685a267f	Added pybind for configs. Added ability to test configs for equality. PiperOrigin-RevId: 697572671	2024-11-18 04:03:51 -08:00
Daniel Keysers	719699f132	Make top_k a runtime argument (instead of a model argument). PiperOrigin-RevId: 696170691	2024-11-13 09:48:59 -08:00
Nanubala Gnana Sai	397952f918	Merge branch 'dev' into feature/ISS-60/implement-self-extend	2024-11-06 00:30:35 +05:30
Jan Wassenberg	868b01601f	Simpler MatMul interface, vocab types, Tristate for use_spinning Add Extents2D, Range2D vocab types Matmul uses ConstMat for inputs and RowPtr for output Move RowVectorBatch to basics.h Separate threading.cc Fix topology string: report cores not LPs, and #HT Move QStride/IsMHA into LayerConfig ImageTokens does not require make_unique. matmul_test: no longer require template args PiperOrigin-RevId: 692963605	2024-11-04 07:48:29 -08:00
Daniel Keysers	583bd93e9a	Factor out addition of ViTConfig to a ModelConfig. Use ModelConfig values for ImageTokens. Output timing info for image token generation. Add a method to copy image data into Image class directly. Minor changes: pipe ModelTraining to more places. PiperOrigin-RevId: 690572283	2024-10-28 05:29:33 -07:00
Nanubala Gnana Sai	f77e61e514	Use runtime config to setup self extend	2024-10-19 13:23:12 +05:30
Nanubala Gnana Sai	fbba1972d0	remove compile time config	2024-10-19 11:58:32 +05:30
Nanubala Gnana Sai	8cf3966be4	compile success: set default self extend values in noSSM and griffin	2024-10-19 11:35:12 +05:30
Daniel Keysers	c6384574db	Fix PaliGemma's GenerateImageTokensT(). Move image related config values from LayerConfig to ModelConfig. Minor changes: Add a few comments, remove gcpp:: qualification where it wasn't needed in a few places, define local constants in VitAttention.DotSoftmaxWeightedSum() PiperOrigin-RevId: 687210519	2024-10-18 01:34:13 -07:00
Ray Smith	0d68555f87	Eliminated TConfig. Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively. Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly. Adjusted WeightsWrapper and ForwardLayer etc to match. The only remaining template arg is the weight type. This enables all the instantiations to be deleted, apart from one per type. It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately. Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively. PiperOrigin-RevId: 686870060	2024-10-17 05:04:22 -07:00
Jan Wassenberg	6ab3ff5bde	Minor cleanup, Windows+Bazel build fixes add app.h comment compress-inl: remove unused typedef gemma-inl: add missing HWY_ATTR and cast separate sum-inl.h and basics.h headers replace more hwy::bfloat16_t with BF16 update include pragmas update dot_test thresholds update Highway version in Bazel for HWY_RCAST_ALIGNED fix PiperOrigin-RevId: 684464326	2024-10-10 09:05:06 -07:00
Krzysztof Ostrowski	12291e1ac0	Internal change. PiperOrigin-RevId: 681583569	2024-10-02 14:03:34 -07:00
Krzysztof Ostrowski	b3239bf509	Internal change. PiperOrigin-RevId: 681530185	2024-10-02 11:33:06 -07:00
Daniel Keysers	606427022c	Fix compiler errors when trying to generate (unused) code for the ConfigNoVit struct. PiperOrigin-RevId: 679049377	2024-09-26 01:55:26 -07:00
Daniel Keysers	f8835fe4a4	Add support for PaliGemma Vision-LM (224x224) to gemma.cpp See https://arxiv.org/abs/2407.07726 for a description of the model. Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that. PiperOrigin-RevId: 677841119	2024-09-23 10:09:38 -07:00
Jan Wassenberg	301dc8067a	Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul Supports converting all weight/activation formats to native MulT (bf16/f32) Also: - ConstMat/MutableMat for const correctness - Move RowVectorBatch to allocator.h so it can be used from Matmul - Add matmul.h so MatMulEnv can be used from Activations - Remove kMaxThreads, detect from PerClusterPools - Build fix: -inl.h files must be textual_hdrs, and highway.h should precede -inl.h ``` zen4 new 64, 24576, 3072, add=0, MatTA=bf16, MatTB=sfp: 616.6 GFLOPS. 64, 3072, 24576, add=0, MatTA=bf16, MatTB=sfp: 460.7 GFLOPS. 64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 598.6 GFLOPS. 64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 435.6 GFLOPS. zen4 old 64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 257.5 GFLOPS. 64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 231.9 GFLOPS. ``` PiperOrigin-RevId: 663729812	2024-08-16 07:52:20 -07:00
Apoorv Reddy	fd1b0743a7	Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B. This is to make it clear that these models are part of the Gemma2 family of models. PiperOrigin-RevId: 661181682	2024-08-09 02:09:06 -07:00
Phil Culliton	1982a6ba00	Internal change PiperOrigin-RevId: 657831926	2024-07-30 20:24:54 -07:00
Daniel Keysers	5a751a9a44	Update gemma-27b to the correct query scaling. PiperOrigin-RevId: 653201646	2024-07-17 05:43:52 -07:00
The gemma.cpp Authors	df3fb70802	Improve readability with RepeatedAttentionWindowSizes PiperOrigin-RevId: 651431738	2024-07-11 09:11:46 -07:00
Kan Wu	f519ab6693	Refactor configurables. PiperOrigin-RevId: 651259154	2024-07-10 21:30:58 -07:00
Kan Wu	7e4b20455e	Add sliding window attention for Gemma 2. PiperOrigin-RevId: 648778253	2024-07-02 11:08:03 -07:00
Jan Wassenberg	e588a7f45d	Add config for att/final cap, skip max-subtract. Fixes #278 Also update includes/deps for backprop/. PiperOrigin-RevId: 648399222	2024-07-01 09:45:26 -07:00
Paul Chang	8ac5d66575	Introduce new Gemma 9B and 27B configs PiperOrigin-RevId: 647299080	2024-06-27 06:45:24 -07:00
The gemma.cpp Authors	a85725614a	Refactor kCachePosSize and kCacheLayerSize into separate functors. PiperOrigin-RevId: 645048519	2024-06-20 08:52:08 -07:00
Paul Chang	d7d9d14f0e	Move kGriffinLayers into ConfigNoSSM, set kGemmaLayers directly For regular (non-SSM) Gemma models, kGriffinLayers is by definition always zero and kGemmaLayers is just the number of layers. PiperOrigin-RevId: 644384531	2024-06-18 07:52:52 -07:00
Jan Wassenberg	29c0c574e6	Integrate matmul into FFW: 4.3x prefill speedup ``` before, bf16: 27.2929 prefill tokens / sec 17.2114 tokens / sec after, bf16 116.496 prefill tokens / sec 17.5391 tokens / sec ``` PiperOrigin-RevId: 643328437	2024-06-14 06:32:26 -07:00
Jan Wassenberg	c15ff9529c	Reduce duplication in Config* by inheriting no-SSM PiperOrigin-RevId: 643030629	2024-06-13 09:48:56 -07:00
Jan Wassenberg	f9b390b134	Support all weight types in a single binary. This changes the command line flags, but the default value retains the previous behavior. Also add a CreateGemma helper to enable extra args without interface changes. PiperOrigin-RevId: 641266411	2024-06-07 09:04:45 -07:00
Jan Wassenberg	57c2cd8b52	Simplifications: remove GemmaInterface and GemmaImpl Split common and weights into separate lib Remove common-inl (does not have to be SIMD code), activations.cc Centralize switch(Model) to avoid duplication Move CompressWeightsT to compress_weights.cc Move LoadWeights to weights.cc PiperOrigin-RevId: 640869202	2024-06-06 05:54:21 -07:00
Zoltan Szabadka	36e4d8bbfe	Add first version of backpropagation support. This is still in progress / experimental, currently it is only implemented for normal gemma MQA attention layers, and no parallelism is added yet for backward pass. Since we need to remember all activations from all layers, the forward pass was also reimplemented with a new activation data structure.	2024-06-04 08:37:49 +00:00
Paul Chang	bacba351d4	Support additional scaling PiperOrigin-RevId: 631429113	2024-05-07 08:16:25 -07:00
Jan Wassenberg	12fb2f05cf	Add per-thread even_odd storage for #166 . Also inline ProjQ and ProjKV lambdas, add missing includes/deps for ops_test. PiperOrigin-RevId: 629460608	2024-04-30 10:42:23 -07:00
Paul Chang	2d4de6b08b	Support absolute positional embeddings from vanilla transformer PiperOrigin-RevId: 628100831	2024-04-25 09:32:14 -07:00
RangerUFO	2099b37732	Change `NumGemmaLayers` and `NumGriffinLayers` to constants in configs	2024-04-09 20:44:41 +08:00
Jan Wassenberg	a982ec1287	Move code to gemma/ so we can remove error-prone copybara: comments. Also fix includes and Lint warnings. PiperOrigin-RevId: 623127487	2024-04-09 04:45:42 -07:00

42 Commits