Yao Chen
029f2d3e98
Implement the matmul op with Onednn to leverage AMX optimization.
...
PiperOrigin-RevId: 683370269
2024-10-08 12:18:35 -07:00
Jan Wassenberg
2c28b18eb0
Add NestedPools: one per socket/cluster
...
Use in dot_test
app.h: add new flags and rename num_threads to max_threads
matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases
PiperOrigin-RevId: 683216386
2024-10-07 09:40:19 -07:00
Jan Wassenberg
bd53b0f7c3
Fix MSAN issue for multiturn. Rewind the prior EOS token.
...
Also move MaybeCheckInitialized to allocator.h
PiperOrigin-RevId: 683187458
2024-10-07 08:07:54 -07:00
Jan Wassenberg
5a71d819cb
Also enable f64 dot/sum for <f32 inputs
...
Add bf16 support to Dot/SumKernelDouble in the same way as *Compensated.
PiperOrigin-RevId: 682308683
2024-10-04 07:12:10 -07:00
Ray Smith
895ee4c6ce
Moved Internal code around to simplify
...
PiperOrigin-RevId: 681877329
2024-10-03 07:55:21 -07:00
Krzysztof Ostrowski
12291e1ac0
Internal change.
...
PiperOrigin-RevId: 681583569
2024-10-02 14:03:34 -07:00
Krzysztof Ostrowski
b3239bf509
Internal change.
...
PiperOrigin-RevId: 681530185
2024-10-02 11:33:06 -07:00
Jan Wassenberg
96d2ab7d31
Minor fix to profiler zone and add comment
...
PiperOrigin-RevId: 681350546
2024-10-02 01:37:50 -07:00
Daniel Keysers
dc2e5f1505
PaliGemma: fix image loading.
...
Use uint8_t to make sure values are not interpreted as signed char.
PiperOrigin-RevId: 680965115
2024-10-01 04:54:04 -07:00
Jan Wassenberg
7d9fcda0d8
-467ms startup: parallel Reshape
...
Also split Softmax into Argmax helper, add comments;
add profiler zones + fix IDE warning
PiperOrigin-RevId: 680954573
2024-10-01 04:11:35 -07:00
Daniel Keysers
d83ad76679
Rename one variable in SampleTopK and update TestSampleTopK.
...
PiperOrigin-RevId: 680897787
2024-10-01 00:51:33 -07:00
Jan Wassenberg
2d14d796e3
1.09x decode speedup for topk=1/temp0: fuse softmax and sample
...
PiperOrigin-RevId: 680589099
2024-09-30 08:37:41 -07:00
Jan Wassenberg
897f902d28
Fix include order, required to build with profiler enabled
...
PiperOrigin-RevId: 680574177
2024-09-30 07:52:50 -07:00
Jan Wassenberg
5e812f07f5
Use f64 Dot and sum in softmax - faster than Cascaded
...
Also let the kernel specify the Raw and State types,
rename WeightT/VecT -> WT/VT.
PiperOrigin-RevId: 680464427
2024-09-30 01:22:09 -07:00
Jan Wassenberg
47eb80a90e
Add double-precision dot variant
...
PiperOrigin-RevId: 679243590
2024-09-26 12:09:10 -07:00
Daniel Keysers
71116daf64
Tiny update of the README formatting.
...
PiperOrigin-RevId: 679162673
2024-09-26 08:38:12 -07:00
Daniel Keysers
709143e9a6
Add download location of Pali Gemma weights to README.md.
...
PiperOrigin-RevId: 679127088
2024-09-26 06:38:11 -07:00
Jan Wassenberg
1bd64ec350
1.6x speedup of MatMulSlow using compensated Dot
...
PiperOrigin-RevId: 679063289
2024-09-26 02:42:53 -07:00
Daniel Keysers
606427022c
Fix compiler errors when trying to generate (unused) code for the ConfigNoVit struct.
...
PiperOrigin-RevId: 679049377
2024-09-26 01:55:26 -07:00
Daniel Keysers
2290eb7d3f
Reduce flakiness of dot_test.
...
PiperOrigin-RevId: 679049273
2024-09-26 01:54:27 -07:00
Copybara-Service
e3507190ae
Merge pull request #394 from ufownl:bugfix/prefix_lm
...
PiperOrigin-RevId: 678710685
2024-09-25 08:25:31 -07:00
RangerUFO
d1010337c3
Fix prefix-LM mode assertion
2024-09-25 22:22:28 +08:00
Jan Wassenberg
e70e686805
Add forward and backward error
...
PiperOrigin-RevId: 678297584
2024-09-24 10:10:29 -07:00
Daniel Keysers
673673cc98
Update expected entropy values for GRIFFIN_2B model.
...
These changed after introduction of "Cascaded summation for Softmax"
PiperOrigin-RevId: 678145851
2024-09-24 02:12:59 -07:00
Daniel Keysers
f8835fe4a4
Add support for PaliGemma Vision-LM (224x224) to gemma.cpp
...
See https://arxiv.org/abs/2407.07726 for a description of the model.
Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that.
PiperOrigin-RevId: 677841119
2024-09-23 10:09:38 -07:00
Jan Wassenberg
c6c10e0a53
Fix topology display for platforms where it fails (Apple)
...
PiperOrigin-RevId: 677800053
2024-09-23 08:14:54 -07:00
Jan Wassenberg
cdbfebb10f
Fix compress-inl bf16->f32 overrun
...
Caught by Arm hwasan but not x86 asan.
PiperOrigin-RevId: 677779421
2024-09-23 07:10:25 -07:00
Jan Wassenberg
35fdf848c7
Cascaded summation for Softmax
...
This can affect generation results after a few hundred tokens.
Also remove profiler from DecompressAndCall, use Add instead of +=,
use PackedSpan for args and remove alignment requirement.
Changing accumulation order in AssimilateCascadedSums updates dot_test thresholds.
PiperOrigin-RevId: 676891797
2024-09-20 10:31:23 -07:00
Copybara-Service
09bc8d62cc
Merge pull request #380 from ufownl:bugfix/threading
...
PiperOrigin-RevId: 676799495
2024-09-20 04:52:48 -07:00
Jan Wassenberg
bb6b398df3
Add pairwise sum dot products for testing
...
Also add wrapper function for threshold comparison.
PiperOrigin-RevId: 676749760
2024-09-20 01:48:52 -07:00
RangerUFO
62be3b98ce
Fix the warnings complained by Clang
2024-09-19 13:57:24 +08:00
RangerUFO
42ab476a9a
Fix the file name conflicts on case-insensitive systems
2024-09-19 13:54:35 +08:00
Daniel Keysers
03f0ee2323
Add tests for SampleTopK that highlight existing problems and fix those:
...
- Sampling was not correct for k>1 and temperature=0.
- Sampling was not correct for only negative logits.
Also restructure the code a bit for better readability and add some asserts for things that shouldn't happen.
PiperOrigin-RevId: 676043267
2024-09-18 10:32:01 -07:00
Daniel Keysers
760a69449e
Add entropy expectations for Griffin-2b model in gemma_test and make sure it passes.
...
PiperOrigin-RevId: 675564389
2024-09-17 07:46:06 -07:00
Daniel Keysers
e4ba93412a
Add const batch accessor to RowVectorBatch.
...
PiperOrigin-RevId: 675530484
2024-09-17 05:42:14 -07:00
Daniel Keysers
892f3bbcbe
Implement scalar version of LayerNorm
...
PiperOrigin-RevId: 675085495
2024-09-16 03:54:10 -07:00
Daniel Keysers
1c8ddcdffe
Adds insert_float() to SbsWriter() to store a float array directly.
...
PiperOrigin-RevId: 673982528
2024-09-12 13:27:24 -07:00
Jan Wassenberg
13a9f76f64
Fix mismatch between blob_store and compress interfaces (bytes)
...
PiperOrigin-RevId: 673027268
2024-09-10 10:59:17 -07:00
Jan Wassenberg
8c0a8834c1
Major compression update, arbitrary-len unpack + new Dot
...
Compression:
* Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad
* New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test
* Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking
* NUQ: support arbitrary-length enc/dec
* New compression/shared, remove sfp.h and nuq.h
* Move Store2 into Traits and provide Compress2 wrapper
* Remove unused Decompress()-with-pool overload
* Simplify CompressedArrayLen, rename to CompressedArrayElements
* Remove unused DistortionStats b_l1_
Misc:
* Add compensated and Kahan dot, support any length
* Use same Dot function everywhere
* Move exact arithmetic functions into fp_arith
* use FloatPtr and MatPtr typedefs in tests; less stack usage
* Rename args to packed/raw
* Remove Traits::Name, instead TypeName<T>()
* Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream
PiperOrigin-RevId: 672868468
2024-09-10 02:22:19 -07:00
Jan Wassenberg
5c0da8c8c3
Minor cleanup/fixes:
...
- optimize_test simplify prompt check
- Fix SFP arg case
- Fix includes
- Align inputs in test
- IsInside: add DASSERT
- Fix PerClusterPool NumThreads
PiperOrigin-RevId: 672530385
2024-09-09 06:58:09 -07:00
Jan Wassenberg
c29e9752c7
Refactor/cleanup, remove even_odd
...
* New compression/shared.h, remove sfp.h
* Remove unused DistortionStats b_l1_
* Move exact arithmetic functions into fp_arith
* Remove even_odd optimization for MatVec (mostly unused)
* use BF16 typedef more widely
* Add kMaxSFP constant
PiperOrigin-RevId: 670996386
2024-09-04 09:25:13 -07:00
Jan Wassenberg
07c34cb18a
Further nuq_test speedups to prevent timeout
...
PiperOrigin-RevId: 670863385
2024-09-04 00:49:44 -07:00
Jan Wassenberg
9661b81c4b
Fix NUQ for SVE - incorrect nibble packing
...
Also speed up test
PiperOrigin-RevId: 670625545
2024-09-03 10:59:01 -07:00
Jan Wassenberg
aa11ddf5fc
1.22x NUQ compress speedup, fix out of bounds access, improve numerics
...
Also clarify the cost computation and move toward non-group-multiple num.
PiperOrigin-RevId: 670544245
2024-09-03 07:10:56 -07:00
Daniel Keysers
437e0eb9af
Internal change. Slight restructuring of gemma_test.
...
PiperOrigin-RevId: 670529565
2024-09-03 06:16:09 -07:00
Daniel Keysers
a8e08778d4
Add an additional QueryModel() overload to GemmaEnv.
...
Use args only in GemmaEnv constructor, store everything else in RuntimeConfig.
Add runtime option to turn off thread spinning.
PiperOrigin-RevId: 670467320
2024-09-03 02:25:19 -07:00
Zoltan Szabadka
f6abbab3a4
Fix asan failure in local attention computation.
...
PiperOrigin-RevId: 670207380
2024-09-02 07:06:10 -07:00
Paul Chang
22d9476aad
Demonstrate constrained decoding in gemma_cpp's hello world example
...
PiperOrigin-RevId: 669327521
2024-08-30 08:03:07 -07:00
Jan Wassenberg
4033ed9e78
Avoid duplication of RMSNorm, support all activation/weight types
...
Add test for RMSNorm
Rename VectorizedRopeAndMulBy -> RopeAndMulBy
Move test_util to util/
PiperOrigin-RevId: 668332927
2024-08-28 01:26:55 -07:00
Daniel Keysers
3c17911875
Make gemma_test slightly more allowing on MultiTurn.
...
PiperOrigin-RevId: 668097277
2024-08-27 12:40:16 -07:00