The gemma.cpp Authors
dfda53e634
Benchmark gemma.cpp with different length inputs.
...
PiperOrigin-RevId: 684607945
2024-10-10 15:59:26 -07:00
Daniel Keysers
3cf519a53e
Remove unused "two-sizes" version of MulByConstAndAdd.
...
PiperOrigin-RevId: 684515900
2024-10-10 11:32:25 -07:00
Daniel Keysers
1eb9ce19dd
Update expected ranges in dot_test.
...
PiperOrigin-RevId: 684515143
2024-10-10 11:31:19 -07:00
Jan Wassenberg
6ab3ff5bde
Minor cleanup, Windows+Bazel build fixes
...
add app.h comment
compress-inl: remove unused typedef
gemma-inl: add missing HWY_ATTR and cast
separate sum-inl.h and basics.h headers
replace more hwy::bfloat16_t with BF16
update include pragmas
update dot_test thresholds
update Highway version in Bazel for HWY_RCAST_ALIGNED fix
PiperOrigin-RevId: 684464326
2024-10-10 09:05:06 -07:00
Ray Smith
85958f5fd3
Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray.
...
Definition of array size is moved to the constructor.
Allocation is separate and parallelized.
All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted.
Replaced all previous ForEachTensor functions with a single unified function.
PiperOrigin-RevId: 684451604
2024-10-10 08:22:30 -07:00
Daniel Keysers
a570e3f662
Reduce number of operations in Gelu() by one Mul.
...
About 5% faster Gen.Activation.
PiperOrigin-RevId: 684035719
2024-10-09 07:50:48 -07:00
Jan Wassenberg
2c28b18eb0
Add NestedPools: one per socket/cluster
...
Use in dot_test
app.h: add new flags and rename num_threads to max_threads
matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases
PiperOrigin-RevId: 683216386
2024-10-07 09:40:19 -07:00
Jan Wassenberg
bd53b0f7c3
Fix MSAN issue for multiturn. Rewind the prior EOS token.
...
Also move MaybeCheckInitialized to allocator.h
PiperOrigin-RevId: 683187458
2024-10-07 08:07:54 -07:00
Jan Wassenberg
5a71d819cb
Also enable f64 dot/sum for <f32 inputs
...
Add bf16 support to Dot/SumKernelDouble in the same way as *Compensated.
PiperOrigin-RevId: 682308683
2024-10-04 07:12:10 -07:00
Ray Smith
895ee4c6ce
Moved Internal code around to simplify
...
PiperOrigin-RevId: 681877329
2024-10-03 07:55:21 -07:00
Krzysztof Ostrowski
12291e1ac0
Internal change.
...
PiperOrigin-RevId: 681583569
2024-10-02 14:03:34 -07:00
Krzysztof Ostrowski
b3239bf509
Internal change.
...
PiperOrigin-RevId: 681530185
2024-10-02 11:33:06 -07:00
Jan Wassenberg
96d2ab7d31
Minor fix to profiler zone and add comment
...
PiperOrigin-RevId: 681350546
2024-10-02 01:37:50 -07:00
Daniel Keysers
dc2e5f1505
PaliGemma: fix image loading.
...
Use uint8_t to make sure values are not interpreted as signed char.
PiperOrigin-RevId: 680965115
2024-10-01 04:54:04 -07:00
Jan Wassenberg
7d9fcda0d8
-467ms startup: parallel Reshape
...
Also split Softmax into Argmax helper, add comments;
add profiler zones + fix IDE warning
PiperOrigin-RevId: 680954573
2024-10-01 04:11:35 -07:00
Daniel Keysers
d83ad76679
Rename one variable in SampleTopK and update TestSampleTopK.
...
PiperOrigin-RevId: 680897787
2024-10-01 00:51:33 -07:00
Jan Wassenberg
2d14d796e3
1.09x decode speedup for topk=1/temp0: fuse softmax and sample
...
PiperOrigin-RevId: 680589099
2024-09-30 08:37:41 -07:00
Jan Wassenberg
897f902d28
Fix include order, required to build with profiler enabled
...
PiperOrigin-RevId: 680574177
2024-09-30 07:52:50 -07:00
Jan Wassenberg
5e812f07f5
Use f64 Dot and sum in softmax - faster than Cascaded
...
Also let the kernel specify the Raw and State types,
rename WeightT/VecT -> WT/VT.
PiperOrigin-RevId: 680464427
2024-09-30 01:22:09 -07:00
Jan Wassenberg
47eb80a90e
Add double-precision dot variant
...
PiperOrigin-RevId: 679243590
2024-09-26 12:09:10 -07:00
Daniel Keysers
71116daf64
Tiny update of the README formatting.
...
PiperOrigin-RevId: 679162673
2024-09-26 08:38:12 -07:00
Daniel Keysers
709143e9a6
Add download location of Pali Gemma weights to README.md.
...
PiperOrigin-RevId: 679127088
2024-09-26 06:38:11 -07:00
Jan Wassenberg
1bd64ec350
1.6x speedup of MatMulSlow using compensated Dot
...
PiperOrigin-RevId: 679063289
2024-09-26 02:42:53 -07:00
Daniel Keysers
606427022c
Fix compiler errors when trying to generate (unused) code for the ConfigNoVit struct.
...
PiperOrigin-RevId: 679049377
2024-09-26 01:55:26 -07:00
Daniel Keysers
2290eb7d3f
Reduce flakiness of dot_test.
...
PiperOrigin-RevId: 679049273
2024-09-26 01:54:27 -07:00
Copybara-Service
e3507190ae
Merge pull request #394 from ufownl:bugfix/prefix_lm
...
PiperOrigin-RevId: 678710685
2024-09-25 08:25:31 -07:00
RangerUFO
d1010337c3
Fix prefix-LM mode assertion
2024-09-25 22:22:28 +08:00
Jan Wassenberg
e70e686805
Add forward and backward error
...
PiperOrigin-RevId: 678297584
2024-09-24 10:10:29 -07:00
Daniel Keysers
673673cc98
Update expected entropy values for GRIFFIN_2B model.
...
These changed after introduction of "Cascaded summation for Softmax"
PiperOrigin-RevId: 678145851
2024-09-24 02:12:59 -07:00
Daniel Keysers
f8835fe4a4
Add support for PaliGemma Vision-LM (224x224) to gemma.cpp
...
See https://arxiv.org/abs/2407.07726 for a description of the model.
Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that.
PiperOrigin-RevId: 677841119
2024-09-23 10:09:38 -07:00
Jan Wassenberg
c6c10e0a53
Fix topology display for platforms where it fails (Apple)
...
PiperOrigin-RevId: 677800053
2024-09-23 08:14:54 -07:00
Jan Wassenberg
cdbfebb10f
Fix compress-inl bf16->f32 overrun
...
Caught by Arm hwasan but not x86 asan.
PiperOrigin-RevId: 677779421
2024-09-23 07:10:25 -07:00
Jan Wassenberg
35fdf848c7
Cascaded summation for Softmax
...
This can affect generation results after a few hundred tokens.
Also remove profiler from DecompressAndCall, use Add instead of +=,
use PackedSpan for args and remove alignment requirement.
Changing accumulation order in AssimilateCascadedSums updates dot_test thresholds.
PiperOrigin-RevId: 676891797
2024-09-20 10:31:23 -07:00
Copybara-Service
09bc8d62cc
Merge pull request #380 from ufownl:bugfix/threading
...
PiperOrigin-RevId: 676799495
2024-09-20 04:52:48 -07:00
Jan Wassenberg
bb6b398df3
Add pairwise sum dot products for testing
...
Also add wrapper function for threshold comparison.
PiperOrigin-RevId: 676749760
2024-09-20 01:48:52 -07:00
RangerUFO
62be3b98ce
Fix the warnings complained by Clang
2024-09-19 13:57:24 +08:00
RangerUFO
42ab476a9a
Fix the file name conflicts on case-insensitive systems
2024-09-19 13:54:35 +08:00
Daniel Keysers
03f0ee2323
Add tests for SampleTopK that highlight existing problems and fix those:
...
- Sampling was not correct for k>1 and temperature=0.
- Sampling was not correct for only negative logits.
Also restructure the code a bit for better readability and add some asserts for things that shouldn't happen.
PiperOrigin-RevId: 676043267
2024-09-18 10:32:01 -07:00
Daniel Keysers
760a69449e
Add entropy expectations for Griffin-2b model in gemma_test and make sure it passes.
...
PiperOrigin-RevId: 675564389
2024-09-17 07:46:06 -07:00
Daniel Keysers
e4ba93412a
Add const batch accessor to RowVectorBatch.
...
PiperOrigin-RevId: 675530484
2024-09-17 05:42:14 -07:00
Daniel Keysers
892f3bbcbe
Implement scalar version of LayerNorm
...
PiperOrigin-RevId: 675085495
2024-09-16 03:54:10 -07:00
Daniel Keysers
1c8ddcdffe
Adds insert_float() to SbsWriter() to store a float array directly.
...
PiperOrigin-RevId: 673982528
2024-09-12 13:27:24 -07:00
Jan Wassenberg
13a9f76f64
Fix mismatch between blob_store and compress interfaces (bytes)
...
PiperOrigin-RevId: 673027268
2024-09-10 10:59:17 -07:00
Jan Wassenberg
8c0a8834c1
Major compression update, arbitrary-len unpack + new Dot
...
Compression:
* Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad
* New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test
* Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking
* NUQ: support arbitrary-length enc/dec
* New compression/shared, remove sfp.h and nuq.h
* Move Store2 into Traits and provide Compress2 wrapper
* Remove unused Decompress()-with-pool overload
* Simplify CompressedArrayLen, rename to CompressedArrayElements
* Remove unused DistortionStats b_l1_
Misc:
* Add compensated and Kahan dot, support any length
* Use same Dot function everywhere
* Move exact arithmetic functions into fp_arith
* use FloatPtr and MatPtr typedefs in tests; less stack usage
* Rename args to packed/raw
* Remove Traits::Name, instead TypeName<T>()
* Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream
PiperOrigin-RevId: 672868468
2024-09-10 02:22:19 -07:00
Jan Wassenberg
5c0da8c8c3
Minor cleanup/fixes:
...
- optimize_test simplify prompt check
- Fix SFP arg case
- Fix includes
- Align inputs in test
- IsInside: add DASSERT
- Fix PerClusterPool NumThreads
PiperOrigin-RevId: 672530385
2024-09-09 06:58:09 -07:00
Jan Wassenberg
c29e9752c7
Refactor/cleanup, remove even_odd
...
* New compression/shared.h, remove sfp.h
* Remove unused DistortionStats b_l1_
* Move exact arithmetic functions into fp_arith
* Remove even_odd optimization for MatVec (mostly unused)
* use BF16 typedef more widely
* Add kMaxSFP constant
PiperOrigin-RevId: 670996386
2024-09-04 09:25:13 -07:00
Jan Wassenberg
07c34cb18a
Further nuq_test speedups to prevent timeout
...
PiperOrigin-RevId: 670863385
2024-09-04 00:49:44 -07:00
Jan Wassenberg
9661b81c4b
Fix NUQ for SVE - incorrect nibble packing
...
Also speed up test
PiperOrigin-RevId: 670625545
2024-09-03 10:59:01 -07:00
Jan Wassenberg
aa11ddf5fc
1.22x NUQ compress speedup, fix out of bounds access, improve numerics
...
Also clarify the cost computation and move toward non-group-multiple num.
PiperOrigin-RevId: 670544245
2024-09-03 07:10:56 -07:00
Daniel Keysers
437e0eb9af
Internal change. Slight restructuring of gemma_test.
...
PiperOrigin-RevId: 670529565
2024-09-03 06:16:09 -07:00