Jan Wassenberg
19cfe14c76
Warning fixes (casts) and fix Windows build for aligned_alloc
...
PiperOrigin-RevId: 689734618
2024-10-25 04:14:04 -07:00
Jan Wassenberg
52af531820
Serialization for class members for use with ModelConfig
...
PiperOrigin-RevId: 689720027
2024-10-25 03:12:34 -07:00
Copybara-Service
efff64605a
Merge pull request #435 from ufownl:feature/disable_topology
...
PiperOrigin-RevId: 689399357
2024-10-24 08:55:23 -07:00
RangerUFO
ec3b27326b
Add a compilation option to disable topology
2024-10-24 18:32:43 +08:00
Paul Chang
4976066095
Try disabling benchmark's gtest integration
...
PiperOrigin-RevId: 689010657
2024-10-23 10:12:45 -07:00
Paul Chang
4197d69dfc
New blob_store_test, ensure ReadOne checks actual size against requested size
...
PiperOrigin-RevId: 688974390
2024-10-23 08:30:46 -07:00
Copybara-Service
91bf2317ff
Merge pull request #426 from ufownl:feature/read_image_from_stream
...
PiperOrigin-RevId: 688137436
2024-10-21 08:00:23 -07:00
Copybara-Service
054935d24b
Merge pull request #432 from ufownl:bugfix/compress_weights_ce
...
PiperOrigin-RevId: 688126076
2024-10-21 07:18:53 -07:00
RangerUFO
7d313aaade
Fix compilation errors of "compress_weights" target
2024-10-19 21:30:30 +08:00
RangerUFO
fcea743107
Fix Bazel builder failure
2024-10-19 19:54:46 +08:00
Jan Wassenberg
02ce1e344f
Use NestedPools, add NUMA infra
...
Improved threading.h, fix thread counts for single package/cluster systems
Temporarily forces to a single socket. Prefill 29.28 tps, decode 6.92.
Also fix benchmarks.cc build, update tensor allocator to Allocator
PiperOrigin-RevId: 687307167
2024-10-18 08:11:18 -07:00
Daniel Keysers
c6384574db
Fix PaliGemma's GenerateImageTokensT().
...
Move image related config values from LayerConfig to ModelConfig.
Minor changes: Add a few comments, remove gcpp:: qualification where it wasn't needed in a few places, define local constants in VitAttention.DotSoftmaxWeightedSum()
PiperOrigin-RevId: 687210519
2024-10-18 01:34:13 -07:00
RangerUFO
e48fc3abb4
Refactor the overloads of `Image::ReadPPM` method
...
Remove the `std::istream` overload and directly parse the PPM format on
the span. Load the image bytes in the file using `ReadFileToString`
helper defined in "compression/io.h" instead of `std::ifstream`.
2024-10-18 02:10:29 +08:00
Ray Smith
0d68555f87
Eliminated TConfig.
...
Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively.
Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly.
Adjusted WeightsWrapper and ForwardLayer etc to match.
The only remaining template arg is the weight type.
This enables all the instantiations to be deleted, apart from one per type.
It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately.
Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively.
PiperOrigin-RevId: 686870060
2024-10-17 05:04:22 -07:00
RangerUFO
de2f7d7e2c
Add an overload of `Image::ReadPPM` method
...
Make it able to load image data from a `hwy::Span`.
2024-10-16 17:34:11 +08:00
RangerUFO
a784b8459d
Add an overload of `Image::ReadPPM` method
...
Make it able to load image data from a stream.
2024-10-16 15:53:27 +08:00
Daniel Keysers
a4d6adbc43
Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize.
...
Remove max_tokens (and rely on only max_generated_tokens).
PiperOrigin-RevId: 685662260
2024-10-14 04:45:21 -07:00
Copybara-Service
2892e232e2
Merge pull request #422 from ufownl:bugfix/compress_weights_ce
...
PiperOrigin-RevId: 685635493
2024-10-14 02:46:33 -07:00
Daniel Keysers
5d0167904d
Fix PaliGemma model loading.
...
PiperOrigin-RevId: 685591935
2024-10-13 23:48:55 -07:00
Daniel Keysers
b7eff19be4
Update expected ranges in dot_test.
...
PiperOrigin-RevId: 685591625
2024-10-13 23:47:20 -07:00
RangerUFO
ed88115e6a
Fix compilation error of the weights compression tool
2024-10-11 18:55:06 +08:00
The gemma.cpp Authors
dfda53e634
Benchmark gemma.cpp with different length inputs.
...
PiperOrigin-RevId: 684607945
2024-10-10 15:59:26 -07:00
Daniel Keysers
3cf519a53e
Remove unused "two-sizes" version of MulByConstAndAdd.
...
PiperOrigin-RevId: 684515900
2024-10-10 11:32:25 -07:00
Daniel Keysers
1eb9ce19dd
Update expected ranges in dot_test.
...
PiperOrigin-RevId: 684515143
2024-10-10 11:31:19 -07:00
Jan Wassenberg
6ab3ff5bde
Minor cleanup, Windows+Bazel build fixes
...
add app.h comment
compress-inl: remove unused typedef
gemma-inl: add missing HWY_ATTR and cast
separate sum-inl.h and basics.h headers
replace more hwy::bfloat16_t with BF16
update include pragmas
update dot_test thresholds
update Highway version in Bazel for HWY_RCAST_ALIGNED fix
PiperOrigin-RevId: 684464326
2024-10-10 09:05:06 -07:00
Ray Smith
85958f5fd3
Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray.
...
Definition of array size is moved to the constructor.
Allocation is separate and parallelized.
All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted.
Replaced all previous ForEachTensor functions with a single unified function.
PiperOrigin-RevId: 684451604
2024-10-10 08:22:30 -07:00
Daniel Keysers
a570e3f662
Reduce number of operations in Gelu() by one Mul.
...
About 5% faster Gen.Activation.
PiperOrigin-RevId: 684035719
2024-10-09 07:50:48 -07:00
Jan Wassenberg
2c28b18eb0
Add NestedPools: one per socket/cluster
...
Use in dot_test
app.h: add new flags and rename num_threads to max_threads
matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases
PiperOrigin-RevId: 683216386
2024-10-07 09:40:19 -07:00
Jan Wassenberg
bd53b0f7c3
Fix MSAN issue for multiturn. Rewind the prior EOS token.
...
Also move MaybeCheckInitialized to allocator.h
PiperOrigin-RevId: 683187458
2024-10-07 08:07:54 -07:00
Jan Wassenberg
5a71d819cb
Also enable f64 dot/sum for <f32 inputs
...
Add bf16 support to Dot/SumKernelDouble in the same way as *Compensated.
PiperOrigin-RevId: 682308683
2024-10-04 07:12:10 -07:00
Ray Smith
895ee4c6ce
Moved Internal code around to simplify
...
PiperOrigin-RevId: 681877329
2024-10-03 07:55:21 -07:00
Krzysztof Ostrowski
12291e1ac0
Internal change.
...
PiperOrigin-RevId: 681583569
2024-10-02 14:03:34 -07:00
Krzysztof Ostrowski
b3239bf509
Internal change.
...
PiperOrigin-RevId: 681530185
2024-10-02 11:33:06 -07:00
Jan Wassenberg
96d2ab7d31
Minor fix to profiler zone and add comment
...
PiperOrigin-RevId: 681350546
2024-10-02 01:37:50 -07:00
Daniel Keysers
dc2e5f1505
PaliGemma: fix image loading.
...
Use uint8_t to make sure values are not interpreted as signed char.
PiperOrigin-RevId: 680965115
2024-10-01 04:54:04 -07:00
Jan Wassenberg
7d9fcda0d8
-467ms startup: parallel Reshape
...
Also split Softmax into Argmax helper, add comments;
add profiler zones + fix IDE warning
PiperOrigin-RevId: 680954573
2024-10-01 04:11:35 -07:00
Daniel Keysers
d83ad76679
Rename one variable in SampleTopK and update TestSampleTopK.
...
PiperOrigin-RevId: 680897787
2024-10-01 00:51:33 -07:00
Jan Wassenberg
2d14d796e3
1.09x decode speedup for topk=1/temp0: fuse softmax and sample
...
PiperOrigin-RevId: 680589099
2024-09-30 08:37:41 -07:00
Jan Wassenberg
897f902d28
Fix include order, required to build with profiler enabled
...
PiperOrigin-RevId: 680574177
2024-09-30 07:52:50 -07:00
Jan Wassenberg
5e812f07f5
Use f64 Dot and sum in softmax - faster than Cascaded
...
Also let the kernel specify the Raw and State types,
rename WeightT/VecT -> WT/VT.
PiperOrigin-RevId: 680464427
2024-09-30 01:22:09 -07:00
Jan Wassenberg
47eb80a90e
Add double-precision dot variant
...
PiperOrigin-RevId: 679243590
2024-09-26 12:09:10 -07:00
Daniel Keysers
71116daf64
Tiny update of the README formatting.
...
PiperOrigin-RevId: 679162673
2024-09-26 08:38:12 -07:00
Daniel Keysers
709143e9a6
Add download location of Pali Gemma weights to README.md.
...
PiperOrigin-RevId: 679127088
2024-09-26 06:38:11 -07:00
Jan Wassenberg
1bd64ec350
1.6x speedup of MatMulSlow using compensated Dot
...
PiperOrigin-RevId: 679063289
2024-09-26 02:42:53 -07:00
Daniel Keysers
606427022c
Fix compiler errors when trying to generate (unused) code for the ConfigNoVit struct.
...
PiperOrigin-RevId: 679049377
2024-09-26 01:55:26 -07:00
Daniel Keysers
2290eb7d3f
Reduce flakiness of dot_test.
...
PiperOrigin-RevId: 679049273
2024-09-26 01:54:27 -07:00
Copybara-Service
e3507190ae
Merge pull request #394 from ufownl:bugfix/prefix_lm
...
PiperOrigin-RevId: 678710685
2024-09-25 08:25:31 -07:00
RangerUFO
d1010337c3
Fix prefix-LM mode assertion
2024-09-25 22:22:28 +08:00
Jan Wassenberg
e70e686805
Add forward and backward error
...
PiperOrigin-RevId: 678297584
2024-09-24 10:10:29 -07:00
Daniel Keysers
673673cc98
Update expected entropy values for GRIFFIN_2B model.
...
These changed after introduction of "Cascaded summation for Softmax"
PiperOrigin-RevId: 678145851
2024-09-24 02:12:59 -07:00