Commit Graph

45 Commits

Author SHA1 Message Date
Jan Wassenberg 56186193c1 Replace mt19937 with new generator to enable parallel sampling
Split it into immutable AesCtrEngine and RngStream
Also add RowSpan and Logits span

PiperOrigin-RevId: 803336423
2025-09-04 23:49:10 -07:00
Ivo Ristovski List b56b2f05e4 Automated Code Change
PiperOrigin-RevId: 789876258
2025-08-01 13:29:50 -07:00
Jan Wassenberg 799c264df3 Pre-tune thread pool before matmul
Also improve profiler annotations - remove near-zero ones and add more for startup

PiperOrigin-RevId: 789352414
2025-07-31 08:45:26 -07:00
Jan Wassenberg ac0d751d20 Rename GetModelConfig->Config
PiperOrigin-RevId: 788506480
2025-07-29 10:18:12 -07:00
Jan Wassenberg e76e29ce11 De-singleton ThreadingContext so callers can pass in their own
weights.cc: fix BindB argument for bf16 tensors
threading_test: enable autotune
PiperOrigin-RevId: 785763618
2025-07-22 02:08:46 -07:00
Jan Wassenberg a04cc287b2 Move MatMulEnv out of Gemma to enable concurrent calls
Also update benchmark_helper config print: add profiler, remove free mem

PiperOrigin-RevId: 774662974
2025-06-23 01:20:09 -07:00
Jan Wassenberg 834cbe5b39 linkstatic in most tests/binaries, remove fully_static_link
Also decrease "eternal" timeout to "long".

Add 2x/4x larger subsections of Frankenstein (from Gutenberg)

PiperOrigin-RevId: 773252901
2025-06-19 01:45:53 -07:00
Mukund Aggarwal 606e22155a Gemma CPP: move PaliGemma tests' helper to a separate class
This helps to be able to use PaliGemma functionalities directly for inference by just providing tokenizer and weight paths.

Added @mukundagg to allowed authors list.

PiperOrigin-RevId: 772705238
2025-06-17 18:37:24 -07:00
Jan Wassenberg e5c81f64a1 Major refactor: clarify query_idx (global) vs qi. Refs #607
Fix missing pos increment for last prefill and check that in gemma_test.
Thanks to @ufownl for pointing this out.

Change argument lists to QBatch with accessors.
Increase default seq_len to 8k.

PiperOrigin-RevId: 771937385
2025-06-16 02:42:02 -07:00
Jan Wassenberg b84149310b Fix paligemma, update its test
Must not pass image tokens to the EmbedMMToken used for text.
Caught by next presubmit test.

paligemma_test: move function bodies into class, regroup variables
PiperOrigin-RevId: 770040014
2025-06-11 02:12:12 -07:00
Daniel Keysers d7b23d532a Restructure internal initialization.
PiperOrigin-RevId: 769507096
2025-06-10 01:25:31 -07:00
Rhett Stucki 824a95793c Fix Image::WriteBinary() writing values to a file one at a time.
PiperOrigin-RevId: 767955187
2025-06-06 00:48:09 -07:00
The gemma.cpp Authors dd7d4a7717 Optimize Image::GetPatch() to copy rows instead of pixels at a time.
PiperOrigin-RevId: 767436146
2025-06-04 22:31:08 -07:00
Jan Wassenberg 6897313080 3x speedup of EmbedImagePatches - GEMM, not GEMV.
Required fixes to handling of non-vector aligned A.
Also move row ptrs to MatMulEnv.

PiperOrigin-RevId: 767029036
2025-06-04 01:18:52 -07:00
Jan Wassenberg 839a642992 Fix paligemma_test, refs #588
Detect PaliGemma models from layer names
Remove unused allocator arg from CreateInvTimescale
matmul: only warn once about dim divisibility
Print config also in tests if --verbosity 2
PiperOrigin-RevId: 766605131
2025-06-03 04:45:22 -07:00
Jan Wassenberg 2038dfd9cc Minor: rename compression/shared -> types.h
PiperOrigin-RevId: 758199851
2025-05-13 06:53:21 -07:00
Jan Wassenberg 252a4e955e Remove support for Gemma 1 and PaliGemma 1 models, superseded by (Pali)Gemma 2.
PiperOrigin-RevId: 756671308
2025-05-09 02:17:27 -07:00
Jan Wassenberg c8d92948f4 Move fields, io* and blob* from compression/ into io/
PiperOrigin-RevId: 755445712
2025-05-06 11:17:19 -07:00
Jan Wassenberg 275135d7e8 Rename-only: remove Allocator2 etc suffixes now that refactoring is complete
PiperOrigin-RevId: 755397220
2025-05-06 09:12:43 -07:00
Jan Wassenberg 8d0882b966 Huge refactor of weight handling and model loading.
Weight handling:
- new ModelStore2 supports both pre-2025 multi-file and single-file formats
- simpler ForEachTensor with TensorArgs
- tensors are constructed with their full suffixed name

I/O:
- support mmap and stride
- Simplified SbsWriter, single insert(); add SbsReader

Misc:
- kMockTokenizer: allow creating with unavailable tokenizer
- configs.h: Simpler enum validity checks via kSentinel
- matmul.h: remove unused enable_bind (now in allocator.h)
- tensor_info: single TensorInfoRegistry class, rename from tensor_index.h

Frontends:
- Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&)
- Deduce model/weight type, remove --model and parsing
- Replace most common.h includes with configs.h
- Remove --compressed_weights, use --weights instead
- Remove ModelInfo, replaced by ModelConfig.

Backprop:
- Reduce max loss, remove backward_scalar_test (timeout)
- Update thresholds because new RandInit changes rng eval order and thus numerics
PiperOrigin-RevId: 755317484
2025-05-06 04:44:21 -07:00
Jan Wassenberg 160a5824fb Cleanup: include fixes/comments, fix leak, vector reserve
Also remove unused RowSpan
configs.cc: Assign prompt wrapping to ModelConfig
configs.h: simplify EnumValid via sentinel

PiperOrigin-RevId: 750278497
2025-04-22 12:01:46 -07:00
Jan Wassenberg 87a658b1c6 Minor cleanup, on-demand NUQ buffer allocation
threading_context: add profiler
compress-inl: add constexpr, on-demand alloc NUQ buffer
gemma_py: model->gemma
Move ScaleWeights to compress.cc
Move PromptWrapping to configs.h
PiperOrigin-RevId: 748347896
2025-04-16 10:49:43 -07:00
Jan Wassenberg 2e722f14f1 Add mmap support (not yet used)
Also: const-correct ArgsBase,
add assert to mat.h checking element_bytes_
BUILD deps update (:shared provides shared.h, not :sfp)
PiperOrigin-RevId: 746073312
2025-04-10 10:03:40 -07:00
Jan Wassenberg 8532da47f7 Major refactor of allocator/args:
use new ThreadingContext2 instead of monostate/init in each frontend
Add ThreadingArgs(replaces AppArgs)

backprop: use Packed() accessor and MakePacked factory and row-based access to allow for stride
compress_weights: remove, moving to py-only exporter instead

Move MatPtr to mat.h and revise interface:
- Generic MatOwner
- rename accessors to Packed*
- support stride/row accessors, fix RowPtr stride

Add TypeBits(Type)
Move GenerateMat to test_util-inl for sharing between matmul test/bench
Move internal init to gemma.cc to avoid duplication
Rename GemmaEnv model_ to gemma_ for disambiguating vs upcoming ModelStorage
Remove --compressed_weights, use --weights instead.
tensor_index: add ExtentsFromInfo and TensorIndexLLM/Img
Allocator: use normal unique_ptr for AllocBytes so users can call directly
threading: use -> because AlignedPtr no longer assumes arrays
PiperOrigin-RevId: 745918637
2025-04-10 01:29:54 -07:00
Daniel Keysers 7af2e70321 Add python wrappers for configs and inference.
Enable building compression/python/compression_test using bazel.
Add default image path for image_test and paligemma_test.

PiperOrigin-RevId: 720583438
2025-01-28 08:22:03 -08:00
Daniel Keysers 493688f6f1 Allow interactive use with new single-file weight format.
Add section about new weights format to README.md.
Remove model_type_required parameter.
Update error handling for flags.

PiperOrigin-RevId: 715788822
2025-01-15 07:22:33 -08:00
Ray Smith b93231a47d Moved the vit config fields to their own config struct
PiperOrigin-RevId: 715692800
2025-01-15 01:09:49 -08:00
The gemma.cpp Authors 5bc356f18f Internal change
PiperOrigin-RevId: 707268913
2024-12-17 15:15:57 -08:00
Daniel Keysers 62c70d6715 Rename ModelTraining to PromptWrapping which is a more accurate name.
PiperOrigin-RevId: 705881500
2024-12-13 07:45:59 -08:00
Daniel Keysers 331d2ccc02 Add support for 448px resolution to PaliGemma and PaliGemma2.
PiperOrigin-RevId: 704361579
2024-12-09 11:38:10 -08:00
Daniel Keysers 719699f132 Make top_k a runtime argument (instead of a model argument).
PiperOrigin-RevId: 696170691
2024-11-13 09:48:59 -08:00
Paul Chang b94295b6d9 Internal changes
PiperOrigin-RevId: 696155630
2024-11-13 09:01:38 -08:00
Jan Wassenberg 868b01601f Simpler MatMul interface, vocab types, Tristate for use_spinning
Add Extents2D, Range2D vocab types
Matmul uses ConstMat for inputs and RowPtr for output
Move RowVectorBatch to basics.h
Separate threading.cc
Fix topology string: report cores not LPs, and #HT
Move QStride/IsMHA into LayerConfig
ImageTokens does not require make_unique.
matmul_test: no longer require template args
PiperOrigin-RevId: 692963605
2024-11-04 07:48:29 -08:00
Daniel Keysers 583bd93e9a Factor out addition of ViTConfig to a ModelConfig.
Use ModelConfig values for ImageTokens.
Output timing info for image token generation.
Add a method to copy image data into Image class directly.
Minor changes: pipe ModelTraining to more places.

PiperOrigin-RevId: 690572283
2024-10-28 05:29:33 -07:00
Jan Wassenberg 19cfe14c76 Warning fixes (casts) and fix Windows build for aligned_alloc
PiperOrigin-RevId: 689734618
2024-10-25 04:14:04 -07:00
Copybara-Service 91bf2317ff Merge pull request #426 from ufownl:feature/read_image_from_stream
PiperOrigin-RevId: 688137436
2024-10-21 08:00:23 -07:00
RangerUFO fcea743107 Fix Bazel builder failure 2024-10-19 19:54:46 +08:00
RangerUFO e48fc3abb4 Refactor the overloads of `Image::ReadPPM` method
Remove the `std::istream` overload and directly parse the PPM format on
the span. Load the image bytes in the file using `ReadFileToString`
helper defined in "compression/io.h" instead of `std::ifstream`.
2024-10-18 02:10:29 +08:00
RangerUFO de2f7d7e2c Add an overload of `Image::ReadPPM` method
Make it able to load image data from a `hwy::Span`.
2024-10-16 17:34:11 +08:00
RangerUFO a784b8459d Add an overload of `Image::ReadPPM` method
Make it able to load image data from a stream.
2024-10-16 15:53:27 +08:00
Daniel Keysers a4d6adbc43 Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize.
Remove max_tokens (and rely on only max_generated_tokens).

PiperOrigin-RevId: 685662260
2024-10-14 04:45:21 -07:00
Jan Wassenberg 6ab3ff5bde Minor cleanup, Windows+Bazel build fixes
add app.h comment
compress-inl: remove unused typedef
gemma-inl: add missing HWY_ATTR and cast
separate sum-inl.h and basics.h headers
replace more hwy::bfloat16_t with BF16
update include pragmas
update dot_test thresholds
update Highway version in Bazel for HWY_RCAST_ALIGNED fix
PiperOrigin-RevId: 684464326
2024-10-10 09:05:06 -07:00
Ray Smith 85958f5fd3 Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray.
Definition of array size is moved to the constructor.
Allocation is separate and parallelized.
All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted.
Replaced all previous ForEachTensor functions with a single unified function.

PiperOrigin-RevId: 684451604
2024-10-10 08:22:30 -07:00
Daniel Keysers dc2e5f1505 PaliGemma: fix image loading.
Use uint8_t to make sure values are not interpreted as signed char.

PiperOrigin-RevId: 680965115
2024-10-01 04:54:04 -07:00
Daniel Keysers f8835fe4a4 Add support for PaliGemma Vision-LM (224x224) to gemma.cpp
See https://arxiv.org/abs/2407.07726 for a description of the model.
Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that.

PiperOrigin-RevId: 677841119
2024-09-23 10:09:38 -07:00