Commit Graph

115 Commits

Author SHA1 Message Date
Jan Wassenberg b18bd781f6 Windows build fixes: struct vs class, unused arg/var, avoid VLA, Deleter arg, casts
PiperOrigin-RevId: 724340518
2025-02-07 07:38:55 -08:00
Jan Wassenberg f31e12e63b Improved blob diff: parallel, tolerance for float
PiperOrigin-RevId: 724060325
2025-02-06 13:46:28 -08:00
Jan Wassenberg 9f5159ff68 Public visibility for compression/
PiperOrigin-RevId: 723529541
2025-02-05 08:53:51 -08:00
Phil Culliton 7ccc6abe87 Allow conversion, loading and inference with NUQ.
PiperOrigin-RevId: 723507890
2025-02-05 07:45:54 -08:00
Phil Culliton 8a6edff319 Base interleaved handling for 4.5-bit NUQ, specifically Enc, DecompressAndZeroPad, and Dec2. Includes tests.
PiperOrigin-RevId: 721821577
2025-01-31 10:35:32 -08:00
Daniel Keysers 7af2e70321 Add python wrappers for configs and inference.
Enable building compression/python/compression_test using bazel.
Add default image path for image_test and paligemma_test.

PiperOrigin-RevId: 720583438
2025-01-28 08:22:03 -08:00
Daniel Keysers bcdb0d65bd Assorted small cleanups.
PiperOrigin-RevId: 720548132
2025-01-28 06:09:45 -08:00
Jan Wassenberg a60b564b88 Infra improvements (2)
ops.h: move CreateInvTimescale to allow calling without depending on gemma
Pass around MatMulEnv instead of pools to avoid re-creating the env
profiler.h can now be used outside SIMD code
allocator: add StepBytes and QuantumSteps
rename worker thread with package/cluster in the name
threading: add Visit* to IndexRange
PiperOrigin-RevId: 718766704
2025-01-23 01:55:19 -08:00
Jan Wassenberg c4398fc72d Infra improvements:
allocator: support mmap, fixed Bind, add padding
bench_matmul: Add PreventElision
BUILD: add ops_test build target
matmul.h: move ConstMat here; dynamic alloc of MatMulEnv
matmul_test: remove benchmarking
replace fprintf with HWY_WARN
threading.cc: support splitting large clusters (disabled); package_idx->pkg_idx, smaller IndexRangePartition
PiperOrigin-RevId: 717512274
2025-01-20 06:22:49 -08:00
Daniel Keysers 493688f6f1 Allow interactive use with new single-file weight format.
Add section about new weights format to README.md.
Remove model_type_required parameter.
Update error handling for flags.

PiperOrigin-RevId: 715788822
2025-01-15 07:22:33 -08:00
Ray Smith 9d40f0117e Added ability to load/save a complete model file, including tokenizer.
PiperOrigin-RevId: 707914366
2024-12-19 07:59:41 -08:00
The gemma.cpp Authors 5bc356f18f Internal change
PiperOrigin-RevId: 707268913
2024-12-17 15:15:57 -08:00
Daniel Keysers 62c70d6715 Rename ModelTraining to PromptWrapping which is a more accurate name.
PiperOrigin-RevId: 705881500
2024-12-13 07:45:59 -08:00
Ray Smith 6254f2e5ca Removed duplicated tensor sizes from weights.h by changing the constructor used for MatPtrT
PiperOrigin-RevId: 705085054
2024-12-11 06:30:28 -08:00
Ray Smith e69bc3bc1c Added the TensorInfo arg to the compressor so the shape and scale can be output correctly to the file in future.
Corrected some errors in the TensorIndex.

PiperOrigin-RevId: 705014619
2024-12-11 01:26:35 -08:00
Jan Wassenberg 642fc97d51 Internal change
PiperOrigin-RevId: 704692923
2024-12-10 06:58:32 -08:00
Jan Wassenberg f74d496879 Threading/infra improvements.
* Add Parallelize*Range helpers and partitioning helpers
* Refactor Pinning class, store original affinity (required to construct another NestedPools after pinning happened)

Compress:
* prevent Compress printing stats in tests
* zero-pad tensors

Matmul:
* add matmul_unit_test (TODO) and bench_matmul
* matmul_test: change norm to row vectors (that is what is added) and include bf16 rounding error
* Prepare for L2/L3 retrieval
PiperOrigin-RevId: 700603811
2024-11-27 01:12:00 -08:00
Ray Smith 3d1625d8c5 Improved consistency of compressor API, and added a universal method with a target type arg.
Moved configs pybind up to root level.

PiperOrigin-RevId: 698743417
2024-11-21 05:27:40 -08:00
Ray Smith 73640d2521 Added tensor_index as a single source of truth on tensor shapes/sources and transformations
PiperOrigin-RevId: 697903886
2024-11-19 00:25:39 -08:00
Ray Smith 96513a8dc3 Added a blob_compare tool that compares two sbs files that may have the blobs in a different order
PiperOrigin-RevId: 696458888
2024-11-14 03:26:32 -08:00
Paul Chang 5674c33dc5 Replace CLIF SbsWriter with pybind-based gcpp extension
Maintains compatibility with previous version.

PiperOrigin-RevId: 696181603
2024-11-13 10:20:02 -08:00
Paul Chang b94295b6d9 Internal changes
PiperOrigin-RevId: 696155630
2024-11-13 09:01:38 -08:00
Paul Chang d4050a2917 Expose BlobReader::Keys()
PiperOrigin-RevId: 694166186
2024-11-07 10:28:39 -08:00
Jan Wassenberg 868b01601f Simpler MatMul interface, vocab types, Tristate for use_spinning
Add Extents2D, Range2D vocab types
Matmul uses ConstMat for inputs and RowPtr for output
Move RowVectorBatch to basics.h
Separate threading.cc
Fix topology string: report cores not LPs, and #HT
Move QStride/IsMHA into LayerConfig
ImageTokens does not require make_unique.
matmul_test: no longer require template args
PiperOrigin-RevId: 692963605
2024-11-04 07:48:29 -08:00
Jan Wassenberg baaa221787 Move BF16 to basics.h for easier access, and use that typedef.
PiperOrigin-RevId: 691422334
2024-10-30 08:09:11 -07:00
Daniel Keysers 583bd93e9a Factor out addition of ViTConfig to a ModelConfig.
Use ModelConfig values for ImageTokens.
Output timing info for image token generation.
Add a method to copy image data into Image class directly.
Minor changes: pipe ModelTraining to more places.

PiperOrigin-RevId: 690572283
2024-10-28 05:29:33 -07:00
Jan Wassenberg 19cfe14c76 Warning fixes (casts) and fix Windows build for aligned_alloc
PiperOrigin-RevId: 689734618
2024-10-25 04:14:04 -07:00
Jan Wassenberg 52af531820 Serialization for class members for use with ModelConfig
PiperOrigin-RevId: 689720027
2024-10-25 03:12:34 -07:00
Paul Chang 4197d69dfc New blob_store_test, ensure ReadOne checks actual size against requested size
PiperOrigin-RevId: 688974390
2024-10-23 08:30:46 -07:00
RangerUFO 7d313aaade Fix compilation errors of "compress_weights" target 2024-10-19 21:30:30 +08:00
Jan Wassenberg 02ce1e344f Use NestedPools, add NUMA infra
Improved threading.h, fix thread counts for single package/cluster systems
Temporarily forces to a single socket. Prefill 29.28 tps, decode 6.92.

Also fix benchmarks.cc build, update tensor allocator to Allocator

PiperOrigin-RevId: 687307167
2024-10-18 08:11:18 -07:00
Ray Smith 0d68555f87 Eliminated TConfig.
Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively.
Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly.
Adjusted WeightsWrapper and ForwardLayer etc to match.
The only remaining template arg is the weight type.
This enables all the instantiations to be deleted, apart from one per type.
It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately.
Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively.

PiperOrigin-RevId: 686870060
2024-10-17 05:04:22 -07:00
RangerUFO ed88115e6a Fix compilation error of the weights compression tool 2024-10-11 18:55:06 +08:00
Jan Wassenberg 6ab3ff5bde Minor cleanup, Windows+Bazel build fixes
add app.h comment
compress-inl: remove unused typedef
gemma-inl: add missing HWY_ATTR and cast
separate sum-inl.h and basics.h headers
replace more hwy::bfloat16_t with BF16
update include pragmas
update dot_test thresholds
update Highway version in Bazel for HWY_RCAST_ALIGNED fix
PiperOrigin-RevId: 684464326
2024-10-10 09:05:06 -07:00
Ray Smith 85958f5fd3 Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray.
Definition of array size is moved to the constructor.
Allocation is separate and parallelized.
All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted.
Replaced all previous ForEachTensor functions with a single unified function.

PiperOrigin-RevId: 684451604
2024-10-10 08:22:30 -07:00
Jan Wassenberg bd53b0f7c3 Fix MSAN issue for multiturn. Rewind the prior EOS token.
Also move MaybeCheckInitialized to allocator.h

PiperOrigin-RevId: 683187458
2024-10-07 08:07:54 -07:00
Jan Wassenberg 5a71d819cb Also enable f64 dot/sum for <f32 inputs
Add bf16 support to Dot/SumKernelDouble in the same way as *Compensated.

PiperOrigin-RevId: 682308683
2024-10-04 07:12:10 -07:00
Jan Wassenberg 5e812f07f5 Use f64 Dot and sum in softmax - faster than Cascaded
Also let the kernel specify the Raw and State types,
rename WeightT/VecT -> WT/VT.

PiperOrigin-RevId: 680464427
2024-09-30 01:22:09 -07:00
Jan Wassenberg 47eb80a90e Add double-precision dot variant
PiperOrigin-RevId: 679243590
2024-09-26 12:09:10 -07:00
Daniel Keysers f8835fe4a4 Add support for PaliGemma Vision-LM (224x224) to gemma.cpp
See https://arxiv.org/abs/2407.07726 for a description of the model.
Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that.

PiperOrigin-RevId: 677841119
2024-09-23 10:09:38 -07:00
Jan Wassenberg cdbfebb10f Fix compress-inl bf16->f32 overrun
Caught by Arm hwasan but not x86 asan.

PiperOrigin-RevId: 677779421
2024-09-23 07:10:25 -07:00
Jan Wassenberg 35fdf848c7 Cascaded summation for Softmax
This can affect generation results after a few hundred tokens.

Also remove profiler from DecompressAndCall, use Add instead of +=,
use PackedSpan for args and remove alignment requirement.
Changing accumulation order in AssimilateCascadedSums updates dot_test thresholds.

PiperOrigin-RevId: 676891797
2024-09-20 10:31:23 -07:00
Copybara-Service 09bc8d62cc Merge pull request #380 from ufownl:bugfix/threading
PiperOrigin-RevId: 676799495
2024-09-20 04:52:48 -07:00
Daniel Keysers 1c8ddcdffe Adds insert_float() to SbsWriter() to store a float array directly.
PiperOrigin-RevId: 673982528
2024-09-12 13:27:24 -07:00
Jan Wassenberg 13a9f76f64 Fix mismatch between blob_store and compress interfaces (bytes)
PiperOrigin-RevId: 673027268
2024-09-10 10:59:17 -07:00
Jan Wassenberg 8c0a8834c1 Major compression update, arbitrary-len unpack + new Dot
Compression:
* Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad
* New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test
* Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking
* NUQ: support arbitrary-length enc/dec
* New compression/shared, remove sfp.h and nuq.h
* Move Store2 into Traits and provide Compress2 wrapper
* Remove unused Decompress()-with-pool overload
* Simplify CompressedArrayLen, rename to CompressedArrayElements
* Remove unused DistortionStats b_l1_

Misc:
* Add compensated and Kahan dot, support any length
* Use same Dot function everywhere
* Move exact arithmetic functions into fp_arith
* use FloatPtr and MatPtr typedefs in tests; less stack usage
* Rename args to packed/raw
* Remove Traits::Name, instead TypeName<T>()
* Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream
PiperOrigin-RevId: 672868468
2024-09-10 02:22:19 -07:00
Jan Wassenberg 5c0da8c8c3 Minor cleanup/fixes:
- optimize_test simplify prompt check
- Fix SFP arg case
- Fix includes
- Align inputs in test
- IsInside: add DASSERT
- Fix PerClusterPool NumThreads

PiperOrigin-RevId: 672530385
2024-09-09 06:58:09 -07:00
Jan Wassenberg c29e9752c7 Refactor/cleanup, remove even_odd
* New compression/shared.h, remove sfp.h
* Remove unused DistortionStats b_l1_
* Move exact arithmetic functions into fp_arith
* Remove even_odd optimization for MatVec (mostly unused)
* use BF16 typedef more widely
* Add kMaxSFP constant

PiperOrigin-RevId: 670996386
2024-09-04 09:25:13 -07:00
Jan Wassenberg 07c34cb18a Further nuq_test speedups to prevent timeout
PiperOrigin-RevId: 670863385
2024-09-04 00:49:44 -07:00
Jan Wassenberg 9661b81c4b Fix NUQ for SVE - incorrect nibble packing
Also speed up test

PiperOrigin-RevId: 670625545
2024-09-03 10:59:01 -07:00