Commit Graph

53 Commits

Author SHA1 Message Date
Jan Wassenberg 9a02d6be68 Add --prompt_file and testdata for it. Refs #608
Linux terminals truncate input after 4096 chars.
testdata is Frankenstein from project Gutenberg, which are long out of copyright.

Also fix loss of coherence after long context caused by incorrect IsGlobalLayer.
Move that to config.h and use max_seq_len as the initializer to make this clear.

Also avoid dynamic allocation for GriffinActivations.

PiperOrigin-RevId: 772333225
2025-06-16 23:41:07 -07:00
Jan Wassenberg c027a45a2e MatPtr-ify KV, shared div_seq_len, --seq_len flag
PiperOrigin-RevId: 770194455
2025-06-11 09:49:38 -07:00
Jan Wassenberg 839a642992 Fix paligemma_test, refs #588
Detect PaliGemma models from layer names
Remove unused allocator arg from CreateInvTimescale
matmul: only warn once about dim divisibility
Print config also in tests if --verbosity 2
PiperOrigin-RevId: 766605131
2025-06-03 04:45:22 -07:00
Jan Wassenberg cf4d7ceb82 1.16x decode speedup: remove last MatVec in Attention
Precompute row pointers.
Remove no longer used MHA support; QStride -> qkv_dim.
Remove RowPtr from MatMul interface, use only MatPtrT.
Require opt-in define for NUQ to speed up builds.
Also fix io.cc on Windows.

PiperOrigin-RevId: 766228108
2025-06-02 09:40:29 -07:00
Jan Wassenberg 3890eb5412 Remove backprop/
Also remove MatPtrT::Packed(); use PackedScale1 instead where const, or Row(0).

PiperOrigin-RevId: 764243198
2025-05-28 07:01:17 -07:00
Jan Wassenberg 2038dfd9cc Minor: rename compression/shared -> types.h
PiperOrigin-RevId: 758199851
2025-05-13 06:53:21 -07:00
Jan Wassenberg 252a4e955e Remove support for Gemma 1 and PaliGemma 1 models, superseded by (Pali)Gemma 2.
PiperOrigin-RevId: 756671308
2025-05-09 02:17:27 -07:00
Jan Wassenberg c8d92948f4 Move fields, io* and blob* from compression/ into io/
PiperOrigin-RevId: 755445712
2025-05-06 11:17:19 -07:00
Jan Wassenberg 8d0882b966 Huge refactor of weight handling and model loading.
Weight handling:
- new ModelStore2 supports both pre-2025 multi-file and single-file formats
- simpler ForEachTensor with TensorArgs
- tensors are constructed with their full suffixed name

I/O:
- support mmap and stride
- Simplified SbsWriter, single insert(); add SbsReader

Misc:
- kMockTokenizer: allow creating with unavailable tokenizer
- configs.h: Simpler enum validity checks via kSentinel
- matmul.h: remove unused enable_bind (now in allocator.h)
- tensor_info: single TensorInfoRegistry class, rename from tensor_index.h

Frontends:
- Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&)
- Deduce model/weight type, remove --model and parsing
- Replace most common.h includes with configs.h
- Remove --compressed_weights, use --weights instead
- Remove ModelInfo, replaced by ModelConfig.

Backprop:
- Reduce max loss, remove backward_scalar_test (timeout)
- Update thresholds because new RandInit changes rng eval order and thus numerics
PiperOrigin-RevId: 755317484
2025-05-06 04:44:21 -07:00
Jan Wassenberg 160a5824fb Cleanup: include fixes/comments, fix leak, vector reserve
Also remove unused RowSpan
configs.cc: Assign prompt wrapping to ModelConfig
configs.h: simplify EnumValid via sentinel

PiperOrigin-RevId: 750278497
2025-04-22 12:01:46 -07:00
Jan Wassenberg 87a658b1c6 Minor cleanup, on-demand NUQ buffer allocation
threading_context: add profiler
compress-inl: add constexpr, on-demand alloc NUQ buffer
gemma_py: model->gemma
Move ScaleWeights to compress.cc
Move PromptWrapping to configs.h
PiperOrigin-RevId: 748347896
2025-04-16 10:49:43 -07:00
Phil Culliton 05b1cce9f7 Add support for a secondary EOS token
PiperOrigin-RevId: 738898976
2025-03-20 12:28:31 -07:00
Phil Culliton 4ab601da10 Internal change.
PiperOrigin-RevId: 736015810
2025-03-11 23:20:20 -07:00
Phil Culliton 9d83ff202e Internal change.
PiperOrigin-RevId: 736014152
2025-03-11 23:10:48 -07:00
Jan Wassenberg b18bd781f6 Windows build fixes: struct vs class, unused arg/var, avoid VLA, Deleter arg, casts
PiperOrigin-RevId: 724340518
2025-02-07 07:38:55 -08:00
Phil Culliton 9646edc908 Internal change
PiperOrigin-RevId: 717916568
2025-01-21 07:53:49 -08:00
Ray Smith b93231a47d Moved the vit config fields to their own config struct
PiperOrigin-RevId: 715692800
2025-01-15 01:09:49 -08:00
Ray Smith 9d40f0117e Added ability to load/save a complete model file, including tokenizer.
PiperOrigin-RevId: 707914366
2024-12-19 07:59:41 -08:00
Daniel Keysers 62c70d6715 Rename ModelTraining to PromptWrapping which is a more accurate name.
PiperOrigin-RevId: 705881500
2024-12-13 07:45:59 -08:00
Daniel Keysers 331d2ccc02 Add support for 448px resolution to PaliGemma and PaliGemma2.
PiperOrigin-RevId: 704361579
2024-12-09 11:38:10 -08:00
Phil Culliton 9dfe2a76be Internal change
PiperOrigin-RevId: 702961613
2024-12-04 20:41:47 -08:00
Ray Smith 7d685a267f Added pybind for configs.
Added ability to test configs for equality.

PiperOrigin-RevId: 697572671
2024-11-18 04:03:51 -08:00
Daniel Keysers 719699f132 Make top_k a runtime argument (instead of a model argument).
PiperOrigin-RevId: 696170691
2024-11-13 09:48:59 -08:00
Jan Wassenberg 868b01601f Simpler MatMul interface, vocab types, Tristate for use_spinning
Add Extents2D, Range2D vocab types
Matmul uses ConstMat for inputs and RowPtr for output
Move RowVectorBatch to basics.h
Separate threading.cc
Fix topology string: report cores not LPs, and #HT
Move QStride/IsMHA into LayerConfig
ImageTokens does not require make_unique.
matmul_test: no longer require template args
PiperOrigin-RevId: 692963605
2024-11-04 07:48:29 -08:00
Daniel Keysers 583bd93e9a Factor out addition of ViTConfig to a ModelConfig.
Use ModelConfig values for ImageTokens.
Output timing info for image token generation.
Add a method to copy image data into Image class directly.
Minor changes: pipe ModelTraining to more places.

PiperOrigin-RevId: 690572283
2024-10-28 05:29:33 -07:00
Daniel Keysers c6384574db Fix PaliGemma's GenerateImageTokensT().
Move image related config values from LayerConfig to ModelConfig.
Minor changes: Add a few comments, remove gcpp:: qualification where it wasn't needed in a few places, define local constants in VitAttention.DotSoftmaxWeightedSum()

PiperOrigin-RevId: 687210519
2024-10-18 01:34:13 -07:00
Ray Smith 0d68555f87 Eliminated TConfig.
Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively.
Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly.
Adjusted WeightsWrapper and ForwardLayer etc to match.
The only remaining template arg is the weight type.
This enables all the instantiations to be deleted, apart from one per type.
It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately.
Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively.

PiperOrigin-RevId: 686870060
2024-10-17 05:04:22 -07:00
Jan Wassenberg 6ab3ff5bde Minor cleanup, Windows+Bazel build fixes
add app.h comment
compress-inl: remove unused typedef
gemma-inl: add missing HWY_ATTR and cast
separate sum-inl.h and basics.h headers
replace more hwy::bfloat16_t with BF16
update include pragmas
update dot_test thresholds
update Highway version in Bazel for HWY_RCAST_ALIGNED fix
PiperOrigin-RevId: 684464326
2024-10-10 09:05:06 -07:00
Krzysztof Ostrowski 12291e1ac0 Internal change.
PiperOrigin-RevId: 681583569
2024-10-02 14:03:34 -07:00
Krzysztof Ostrowski b3239bf509 Internal change.
PiperOrigin-RevId: 681530185
2024-10-02 11:33:06 -07:00
Daniel Keysers 606427022c Fix compiler errors when trying to generate (unused) code for the ConfigNoVit struct.
PiperOrigin-RevId: 679049377
2024-09-26 01:55:26 -07:00
Daniel Keysers f8835fe4a4 Add support for PaliGemma Vision-LM (224x224) to gemma.cpp
See https://arxiv.org/abs/2407.07726 for a description of the model.
Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that.

PiperOrigin-RevId: 677841119
2024-09-23 10:09:38 -07:00
Jan Wassenberg 301dc8067a Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul
Supports converting all weight/activation formats to native MulT (bf16/f32)

Also:
- ConstMat/MutableMat for const correctness
- Move RowVectorBatch to allocator.h so it can be used from Matmul
- Add matmul.h so MatMulEnv can be used from Activations
- Remove kMaxThreads, detect from PerClusterPools
- Build fix: -inl.h files must be textual_hdrs, and highway.h should precede -inl.h

```
zen4 new
64, 24576, 3072, add=0, MatTA=bf16, MatTB=sfp:   616.6 GFLOPS.
64, 3072, 24576, add=0, MatTA=bf16, MatTB=sfp:   460.7 GFLOPS.
64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp:    598.6 GFLOPS.
64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp:    435.6 GFLOPS.

zen4 old
64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp:    257.5 GFLOPS.
64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp:    231.9 GFLOPS.
```

PiperOrigin-RevId: 663729812
2024-08-16 07:52:20 -07:00
Apoorv Reddy fd1b0743a7 Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B.
This is to make it clear that these models are part of the Gemma2 family of models.

PiperOrigin-RevId: 661181682
2024-08-09 02:09:06 -07:00
Phil Culliton 1982a6ba00 Internal change
PiperOrigin-RevId: 657831926
2024-07-30 20:24:54 -07:00
Daniel Keysers 5a751a9a44 Update gemma-27b to the correct query scaling.
PiperOrigin-RevId: 653201646
2024-07-17 05:43:52 -07:00
The gemma.cpp Authors df3fb70802 Improve readability with RepeatedAttentionWindowSizes
PiperOrigin-RevId: 651431738
2024-07-11 09:11:46 -07:00
Kan Wu f519ab6693 Refactor configurables.
PiperOrigin-RevId: 651259154
2024-07-10 21:30:58 -07:00
Kan Wu 7e4b20455e Add sliding window attention for Gemma 2.
PiperOrigin-RevId: 648778253
2024-07-02 11:08:03 -07:00
Jan Wassenberg e588a7f45d Add config for att/final cap, skip max-subtract. Fixes #278
Also update includes/deps for backprop/.

PiperOrigin-RevId: 648399222
2024-07-01 09:45:26 -07:00
Paul Chang 8ac5d66575 Introduce new Gemma 9B and 27B configs
PiperOrigin-RevId: 647299080
2024-06-27 06:45:24 -07:00
The gemma.cpp Authors a85725614a Refactor kCachePosSize and kCacheLayerSize into separate functors.
PiperOrigin-RevId: 645048519
2024-06-20 08:52:08 -07:00
Paul Chang d7d9d14f0e Move kGriffinLayers into ConfigNoSSM, set kGemmaLayers directly
For regular (non-SSM) Gemma models, kGriffinLayers is by definition always zero
and kGemmaLayers is just the number of layers.

PiperOrigin-RevId: 644384531
2024-06-18 07:52:52 -07:00
Jan Wassenberg 29c0c574e6 Integrate matmul into FFW: 4.3x prefill speedup
```
before, bf16:
27.2929 prefill tokens / sec
17.2114 tokens / sec

after, bf16
116.496 prefill tokens / sec
17.5391 tokens / sec
```

PiperOrigin-RevId: 643328437
2024-06-14 06:32:26 -07:00
Jan Wassenberg c15ff9529c Reduce duplication in Config* by inheriting no-SSM
PiperOrigin-RevId: 643030629
2024-06-13 09:48:56 -07:00
Jan Wassenberg f9b390b134 Support all weight types in a single binary.
This changes the command line flags, but the default value retains the previous behavior.

Also add a CreateGemma helper to enable extra args without interface changes.

PiperOrigin-RevId: 641266411
2024-06-07 09:04:45 -07:00
Jan Wassenberg 57c2cd8b52 Simplifications: remove GemmaInterface and GemmaImpl
Split common and weights into separate lib
Remove common-inl (does not have to be SIMD code), activations.cc
Centralize switch(Model) to avoid duplication
Move CompressWeightsT to compress_weights.cc
Move LoadWeights to weights.cc

PiperOrigin-RevId: 640869202
2024-06-06 05:54:21 -07:00
Zoltan Szabadka 36e4d8bbfe Add first version of backpropagation support.
This is still in progress / experimental, currently it is only
implemented for normal gemma MQA attention layers, and no
parallelism is added yet for backward pass.

Since we need to remember all activations from all layers, the
forward pass was also reimplemented with a new activation data
structure.
2024-06-04 08:37:49 +00:00
Paul Chang bacba351d4 Support additional scaling
PiperOrigin-RevId: 631429113
2024-05-07 08:16:25 -07:00
Jan Wassenberg 12fb2f05cf Add per-thread even_odd storage for #166.
Also inline ProjQ and ProjKV lambdas,
add missing includes/deps for ops_test.

PiperOrigin-RevId: 629460608
2024-04-30 10:42:23 -07:00