Commit Graph

497 Commits

Author SHA1 Message Date
Paul Chang 4976066095 Try disabling benchmark's gtest integration
PiperOrigin-RevId: 689010657
2024-10-23 10:12:45 -07:00
Paul Chang 4197d69dfc New blob_store_test, ensure ReadOne checks actual size against requested size
PiperOrigin-RevId: 688974390
2024-10-23 08:30:46 -07:00
Copybara-Service 91bf2317ff Merge pull request #426 from ufownl:feature/read_image_from_stream
PiperOrigin-RevId: 688137436
2024-10-21 08:00:23 -07:00
Copybara-Service 054935d24b Merge pull request #432 from ufownl:bugfix/compress_weights_ce
PiperOrigin-RevId: 688126076
2024-10-21 07:18:53 -07:00
RangerUFO 7d313aaade Fix compilation errors of "compress_weights" target 2024-10-19 21:30:30 +08:00
RangerUFO fcea743107 Fix Bazel builder failure 2024-10-19 19:54:46 +08:00
Jan Wassenberg 02ce1e344f Use NestedPools, add NUMA infra
Improved threading.h, fix thread counts for single package/cluster systems
Temporarily forces to a single socket. Prefill 29.28 tps, decode 6.92.

Also fix benchmarks.cc build, update tensor allocator to Allocator

PiperOrigin-RevId: 687307167
2024-10-18 08:11:18 -07:00
Daniel Keysers c6384574db Fix PaliGemma's GenerateImageTokensT().
Move image related config values from LayerConfig to ModelConfig.
Minor changes: Add a few comments, remove gcpp:: qualification where it wasn't needed in a few places, define local constants in VitAttention.DotSoftmaxWeightedSum()

PiperOrigin-RevId: 687210519
2024-10-18 01:34:13 -07:00
RangerUFO e48fc3abb4 Refactor the overloads of `Image::ReadPPM` method
Remove the `std::istream` overload and directly parse the PPM format on
the span. Load the image bytes in the file using `ReadFileToString`
helper defined in "compression/io.h" instead of `std::ifstream`.
2024-10-18 02:10:29 +08:00
Ray Smith 0d68555f87 Eliminated TConfig.
Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively.
Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly.
Adjusted WeightsWrapper and ForwardLayer etc to match.
The only remaining template arg is the weight type.
This enables all the instantiations to be deleted, apart from one per type.
It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately.
Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively.

PiperOrigin-RevId: 686870060
2024-10-17 05:04:22 -07:00
RangerUFO de2f7d7e2c Add an overload of `Image::ReadPPM` method
Make it able to load image data from a `hwy::Span`.
2024-10-16 17:34:11 +08:00
RangerUFO a784b8459d Add an overload of `Image::ReadPPM` method
Make it able to load image data from a stream.
2024-10-16 15:53:27 +08:00
Daniel Keysers a4d6adbc43 Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize.
Remove max_tokens (and rely on only max_generated_tokens).

PiperOrigin-RevId: 685662260
2024-10-14 04:45:21 -07:00
Copybara-Service 2892e232e2 Merge pull request #422 from ufownl:bugfix/compress_weights_ce
PiperOrigin-RevId: 685635493
2024-10-14 02:46:33 -07:00
Daniel Keysers 5d0167904d Fix PaliGemma model loading.
PiperOrigin-RevId: 685591935
2024-10-13 23:48:55 -07:00
Daniel Keysers b7eff19be4 Update expected ranges in dot_test.
PiperOrigin-RevId: 685591625
2024-10-13 23:47:20 -07:00
RangerUFO ed88115e6a Fix compilation error of the weights compression tool 2024-10-11 18:55:06 +08:00
The gemma.cpp Authors dfda53e634 Benchmark gemma.cpp with different length inputs.
PiperOrigin-RevId: 684607945
2024-10-10 15:59:26 -07:00
Daniel Keysers 3cf519a53e Remove unused "two-sizes" version of MulByConstAndAdd.
PiperOrigin-RevId: 684515900
2024-10-10 11:32:25 -07:00
Daniel Keysers 1eb9ce19dd Update expected ranges in dot_test.
PiperOrigin-RevId: 684515143
2024-10-10 11:31:19 -07:00
Jan Wassenberg 6ab3ff5bde Minor cleanup, Windows+Bazel build fixes
add app.h comment
compress-inl: remove unused typedef
gemma-inl: add missing HWY_ATTR and cast
separate sum-inl.h and basics.h headers
replace more hwy::bfloat16_t with BF16
update include pragmas
update dot_test thresholds
update Highway version in Bazel for HWY_RCAST_ALIGNED fix
PiperOrigin-RevId: 684464326
2024-10-10 09:05:06 -07:00
Ray Smith 85958f5fd3 Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray.
Definition of array size is moved to the constructor.
Allocation is separate and parallelized.
All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted.
Replaced all previous ForEachTensor functions with a single unified function.

PiperOrigin-RevId: 684451604
2024-10-10 08:22:30 -07:00
Daniel Keysers a570e3f662 Reduce number of operations in Gelu() by one Mul.
About 5% faster Gen.Activation.

PiperOrigin-RevId: 684035719
2024-10-09 07:50:48 -07:00
Jan Wassenberg 2c28b18eb0 Add NestedPools: one per socket/cluster
Use in dot_test
app.h: add new flags and rename num_threads to max_threads
matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases
PiperOrigin-RevId: 683216386
2024-10-07 09:40:19 -07:00
Jan Wassenberg bd53b0f7c3 Fix MSAN issue for multiturn. Rewind the prior EOS token.
Also move MaybeCheckInitialized to allocator.h

PiperOrigin-RevId: 683187458
2024-10-07 08:07:54 -07:00
Jan Wassenberg 5a71d819cb Also enable f64 dot/sum for <f32 inputs
Add bf16 support to Dot/SumKernelDouble in the same way as *Compensated.

PiperOrigin-RevId: 682308683
2024-10-04 07:12:10 -07:00
Ray Smith 895ee4c6ce Moved Internal code around to simplify
PiperOrigin-RevId: 681877329
2024-10-03 07:55:21 -07:00
Krzysztof Ostrowski 12291e1ac0 Internal change.
PiperOrigin-RevId: 681583569
2024-10-02 14:03:34 -07:00
Krzysztof Ostrowski b3239bf509 Internal change.
PiperOrigin-RevId: 681530185
2024-10-02 11:33:06 -07:00
Jan Wassenberg 96d2ab7d31 Minor fix to profiler zone and add comment
PiperOrigin-RevId: 681350546
2024-10-02 01:37:50 -07:00
Daniel Keysers dc2e5f1505 PaliGemma: fix image loading.
Use uint8_t to make sure values are not interpreted as signed char.

PiperOrigin-RevId: 680965115
2024-10-01 04:54:04 -07:00
Jan Wassenberg 7d9fcda0d8 -467ms startup: parallel Reshape
Also split Softmax into Argmax helper, add comments;
add profiler zones + fix IDE warning

PiperOrigin-RevId: 680954573
2024-10-01 04:11:35 -07:00
Daniel Keysers d83ad76679 Rename one variable in SampleTopK and update TestSampleTopK.
PiperOrigin-RevId: 680897787
2024-10-01 00:51:33 -07:00
Jan Wassenberg 2d14d796e3 1.09x decode speedup for topk=1/temp0: fuse softmax and sample
PiperOrigin-RevId: 680589099
2024-09-30 08:37:41 -07:00
Jan Wassenberg 897f902d28 Fix include order, required to build with profiler enabled
PiperOrigin-RevId: 680574177
2024-09-30 07:52:50 -07:00
Jan Wassenberg 5e812f07f5 Use f64 Dot and sum in softmax - faster than Cascaded
Also let the kernel specify the Raw and State types,
rename WeightT/VecT -> WT/VT.

PiperOrigin-RevId: 680464427
2024-09-30 01:22:09 -07:00
Jan Wassenberg 47eb80a90e Add double-precision dot variant
PiperOrigin-RevId: 679243590
2024-09-26 12:09:10 -07:00
Daniel Keysers 71116daf64 Tiny update of the README formatting.
PiperOrigin-RevId: 679162673
2024-09-26 08:38:12 -07:00
Daniel Keysers 709143e9a6 Add download location of Pali Gemma weights to README.md.
PiperOrigin-RevId: 679127088
2024-09-26 06:38:11 -07:00
Jan Wassenberg 1bd64ec350 1.6x speedup of MatMulSlow using compensated Dot
PiperOrigin-RevId: 679063289
2024-09-26 02:42:53 -07:00
Daniel Keysers 606427022c Fix compiler errors when trying to generate (unused) code for the ConfigNoVit struct.
PiperOrigin-RevId: 679049377
2024-09-26 01:55:26 -07:00
Daniel Keysers 2290eb7d3f Reduce flakiness of dot_test.
PiperOrigin-RevId: 679049273
2024-09-26 01:54:27 -07:00
Copybara-Service e3507190ae Merge pull request #394 from ufownl:bugfix/prefix_lm
PiperOrigin-RevId: 678710685
2024-09-25 08:25:31 -07:00
RangerUFO d1010337c3 Fix prefix-LM mode assertion 2024-09-25 22:22:28 +08:00
Jan Wassenberg e70e686805 Add forward and backward error
PiperOrigin-RevId: 678297584
2024-09-24 10:10:29 -07:00
Daniel Keysers 673673cc98 Update expected entropy values for GRIFFIN_2B model.
These changed after introduction of "Cascaded summation for Softmax"

PiperOrigin-RevId: 678145851
2024-09-24 02:12:59 -07:00
Daniel Keysers f8835fe4a4 Add support for PaliGemma Vision-LM (224x224) to gemma.cpp
See https://arxiv.org/abs/2407.07726 for a description of the model.
Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that.

PiperOrigin-RevId: 677841119
2024-09-23 10:09:38 -07:00
Jan Wassenberg c6c10e0a53 Fix topology display for platforms where it fails (Apple)
PiperOrigin-RevId: 677800053
2024-09-23 08:14:54 -07:00
Jan Wassenberg cdbfebb10f Fix compress-inl bf16->f32 overrun
Caught by Arm hwasan but not x86 asan.

PiperOrigin-RevId: 677779421
2024-09-23 07:10:25 -07:00
Jan Wassenberg 35fdf848c7 Cascaded summation for Softmax
This can affect generation results after a few hundred tokens.

Also remove profiler from DecompressAndCall, use Add instead of +=,
use PackedSpan for args and remove alignment requirement.
Changing accumulation order in AssimilateCascadedSums updates dot_test thresholds.

PiperOrigin-RevId: 676891797
2024-09-20 10:31:23 -07:00