Commit Graph

621 Commits

Author SHA1 Message Date
Eric Curtin a971088ac2 Refactor `gemma/common.cc` to improve readability and safety
Use `std::size` for array size calculations. Replace C-style
string manipulations with `std::string` methods. Simplify
`std::transform` usage for case conversion.

Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-09 16:36:25 +00:00
The gemma.cpp Authors 66bb435121 No public description
PiperOrigin-RevId: 704178245
2024-12-09 00:49:36 -08:00
Phil Culliton 9dfe2a76be Internal change
PiperOrigin-RevId: 702961613
2024-12-04 20:41:47 -08:00
Jan Wassenberg 6a34e9c547 Print cache info and update Highway version for that
PiperOrigin-RevId: 702318451
2024-12-03 06:31:52 -08:00
Jan Wassenberg f74d496879 Threading/infra improvements.
* Add Parallelize*Range helpers and partitioning helpers
* Refactor Pinning class, store original affinity (required to construct another NestedPools after pinning happened)

Compress:
* prevent Compress printing stats in tests
* zero-pad tensors

Matmul:
* add matmul_unit_test (TODO) and bench_matmul
* matmul_test: change norm to row vectors (that is what is added) and include bf16 rounding error
* Prepare for L2/L3 retrieval
PiperOrigin-RevId: 700603811
2024-11-27 01:12:00 -08:00
Stanko Novakovic 109a4d9f85 Add a simple benchmark for batching.
This is a simple Gemma benchmark with a fixed batch size of 32.

PiperOrigin-RevId: 698843573
2024-11-21 10:59:49 -08:00
Ray Smith 3d1625d8c5 Improved consistency of compressor API, and added a universal method with a target type arg.
Moved configs pybind up to root level.

PiperOrigin-RevId: 698743417
2024-11-21 05:27:40 -08:00
Ray Smith 73640d2521 Added tensor_index as a single source of truth on tensor shapes/sources and transformations
PiperOrigin-RevId: 697903886
2024-11-19 00:25:39 -08:00
Ray Smith 7d685a267f Added pybind for configs.
Added ability to test configs for equality.

PiperOrigin-RevId: 697572671
2024-11-18 04:03:51 -08:00
Jan Wassenberg 36f02ef892 Internal change.
PiperOrigin-RevId: 696815335
2024-11-15 02:22:32 -08:00
Ray Smith 96513a8dc3 Added a blob_compare tool that compares two sbs files that may have the blobs in a different order
PiperOrigin-RevId: 696458888
2024-11-14 03:26:32 -08:00
Paul Chang 5674c33dc5 Replace CLIF SbsWriter with pybind-based gcpp extension
Maintains compatibility with previous version.

PiperOrigin-RevId: 696181603
2024-11-13 10:20:02 -08:00
Daniel Keysers 719699f132 Make top_k a runtime argument (instead of a model argument).
PiperOrigin-RevId: 696170691
2024-11-13 09:48:59 -08:00
Paul Chang b94295b6d9 Internal changes
PiperOrigin-RevId: 696155630
2024-11-13 09:01:38 -08:00
Daniel Keysers e54d9cbddd Fix Griffin model:
- use HalfRope position encodings
- zero-initialize the caches for each Generate at position 0

The lack of the latter made the tests in gemma_test dependent on each other.

PiperOrigin-RevId: 694509054
2024-11-08 08:30:53 -08:00
Paul Chang d4050a2917 Expose BlobReader::Keys()
PiperOrigin-RevId: 694166186
2024-11-07 10:28:39 -08:00
Jan Wassenberg 868b01601f Simpler MatMul interface, vocab types, Tristate for use_spinning
Add Extents2D, Range2D vocab types
Matmul uses ConstMat for inputs and RowPtr for output
Move RowVectorBatch to basics.h
Separate threading.cc
Fix topology string: report cores not LPs, and #HT
Move QStride/IsMHA into LayerConfig
ImageTokens does not require make_unique.
matmul_test: no longer require template args
PiperOrigin-RevId: 692963605
2024-11-04 07:48:29 -08:00
Jan Wassenberg baaa221787 Move BF16 to basics.h for easier access, and use that typedef.
PiperOrigin-RevId: 691422334
2024-10-30 08:09:11 -07:00
Daniel Keysers ed4091921f Reduce time for optimize_test and use exactly one (unpinned) thread.
PiperOrigin-RevId: 691013413
2024-10-29 07:37:22 -07:00
Daniel Keysers 583bd93e9a Factor out addition of ViTConfig to a ModelConfig.
Use ModelConfig values for ImageTokens.
Output timing info for image token generation.
Add a method to copy image data into Image class directly.
Minor changes: pipe ModelTraining to more places.

PiperOrigin-RevId: 690572283
2024-10-28 05:29:33 -07:00
Jan Wassenberg 19cfe14c76 Warning fixes (casts) and fix Windows build for aligned_alloc
PiperOrigin-RevId: 689734618
2024-10-25 04:14:04 -07:00
Jan Wassenberg 52af531820 Serialization for class members for use with ModelConfig
PiperOrigin-RevId: 689720027
2024-10-25 03:12:34 -07:00
Copybara-Service efff64605a Merge pull request #435 from ufownl:feature/disable_topology
PiperOrigin-RevId: 689399357
2024-10-24 08:55:23 -07:00
RangerUFO ec3b27326b Add a compilation option to disable topology 2024-10-24 18:32:43 +08:00
Paul Chang 4976066095 Try disabling benchmark's gtest integration
PiperOrigin-RevId: 689010657
2024-10-23 10:12:45 -07:00
Paul Chang 4197d69dfc New blob_store_test, ensure ReadOne checks actual size against requested size
PiperOrigin-RevId: 688974390
2024-10-23 08:30:46 -07:00
Copybara-Service 91bf2317ff Merge pull request #426 from ufownl:feature/read_image_from_stream
PiperOrigin-RevId: 688137436
2024-10-21 08:00:23 -07:00
Copybara-Service 054935d24b Merge pull request #432 from ufownl:bugfix/compress_weights_ce
PiperOrigin-RevId: 688126076
2024-10-21 07:18:53 -07:00
RangerUFO 7d313aaade Fix compilation errors of "compress_weights" target 2024-10-19 21:30:30 +08:00
RangerUFO fcea743107 Fix Bazel builder failure 2024-10-19 19:54:46 +08:00
Jan Wassenberg 02ce1e344f Use NestedPools, add NUMA infra
Improved threading.h, fix thread counts for single package/cluster systems
Temporarily forces to a single socket. Prefill 29.28 tps, decode 6.92.

Also fix benchmarks.cc build, update tensor allocator to Allocator

PiperOrigin-RevId: 687307167
2024-10-18 08:11:18 -07:00
Daniel Keysers c6384574db Fix PaliGemma's GenerateImageTokensT().
Move image related config values from LayerConfig to ModelConfig.
Minor changes: Add a few comments, remove gcpp:: qualification where it wasn't needed in a few places, define local constants in VitAttention.DotSoftmaxWeightedSum()

PiperOrigin-RevId: 687210519
2024-10-18 01:34:13 -07:00
RangerUFO e48fc3abb4 Refactor the overloads of `Image::ReadPPM` method
Remove the `std::istream` overload and directly parse the PPM format on
the span. Load the image bytes in the file using `ReadFileToString`
helper defined in "compression/io.h" instead of `std::ifstream`.
2024-10-18 02:10:29 +08:00
Ray Smith 0d68555f87 Eliminated TConfig.
Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively.
Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly.
Adjusted WeightsWrapper and ForwardLayer etc to match.
The only remaining template arg is the weight type.
This enables all the instantiations to be deleted, apart from one per type.
It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately.
Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively.

PiperOrigin-RevId: 686870060
2024-10-17 05:04:22 -07:00
RangerUFO de2f7d7e2c Add an overload of `Image::ReadPPM` method
Make it able to load image data from a `hwy::Span`.
2024-10-16 17:34:11 +08:00
RangerUFO a784b8459d Add an overload of `Image::ReadPPM` method
Make it able to load image data from a stream.
2024-10-16 15:53:27 +08:00
Daniel Keysers a4d6adbc43 Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize.
Remove max_tokens (and rely on only max_generated_tokens).

PiperOrigin-RevId: 685662260
2024-10-14 04:45:21 -07:00
Copybara-Service 2892e232e2 Merge pull request #422 from ufownl:bugfix/compress_weights_ce
PiperOrigin-RevId: 685635493
2024-10-14 02:46:33 -07:00
Daniel Keysers 5d0167904d Fix PaliGemma model loading.
PiperOrigin-RevId: 685591935
2024-10-13 23:48:55 -07:00
Daniel Keysers b7eff19be4 Update expected ranges in dot_test.
PiperOrigin-RevId: 685591625
2024-10-13 23:47:20 -07:00
RangerUFO ed88115e6a Fix compilation error of the weights compression tool 2024-10-11 18:55:06 +08:00
The gemma.cpp Authors dfda53e634 Benchmark gemma.cpp with different length inputs.
PiperOrigin-RevId: 684607945
2024-10-10 15:59:26 -07:00
Daniel Keysers 3cf519a53e Remove unused "two-sizes" version of MulByConstAndAdd.
PiperOrigin-RevId: 684515900
2024-10-10 11:32:25 -07:00
Daniel Keysers 1eb9ce19dd Update expected ranges in dot_test.
PiperOrigin-RevId: 684515143
2024-10-10 11:31:19 -07:00
Jan Wassenberg 6ab3ff5bde Minor cleanup, Windows+Bazel build fixes
add app.h comment
compress-inl: remove unused typedef
gemma-inl: add missing HWY_ATTR and cast
separate sum-inl.h and basics.h headers
replace more hwy::bfloat16_t with BF16
update include pragmas
update dot_test thresholds
update Highway version in Bazel for HWY_RCAST_ALIGNED fix
PiperOrigin-RevId: 684464326
2024-10-10 09:05:06 -07:00
Ray Smith 85958f5fd3 Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray.
Definition of array size is moved to the constructor.
Allocation is separate and parallelized.
All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted.
Replaced all previous ForEachTensor functions with a single unified function.

PiperOrigin-RevId: 684451604
2024-10-10 08:22:30 -07:00
Daniel Keysers a570e3f662 Reduce number of operations in Gelu() by one Mul.
About 5% faster Gen.Activation.

PiperOrigin-RevId: 684035719
2024-10-09 07:50:48 -07:00
Jan Wassenberg 2c28b18eb0 Add NestedPools: one per socket/cluster
Use in dot_test
app.h: add new flags and rename num_threads to max_threads
matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases
PiperOrigin-RevId: 683216386
2024-10-07 09:40:19 -07:00
Jan Wassenberg bd53b0f7c3 Fix MSAN issue for multiturn. Rewind the prior EOS token.
Also move MaybeCheckInitialized to allocator.h

PiperOrigin-RevId: 683187458
2024-10-07 08:07:54 -07:00
Jan Wassenberg 5a71d819cb Also enable f64 dot/sum for <f32 inputs
Add bf16 support to Dot/SumKernelDouble in the same way as *Compensated.

PiperOrigin-RevId: 682308683
2024-10-04 07:12:10 -07:00