Commit Graph

475 Commits

Author SHA1 Message Date
Yao Chen 029f2d3e98 Implement the matmul op with Onednn to leverage AMX optimization.
PiperOrigin-RevId: 683370269
2024-10-08 12:18:35 -07:00
Jan Wassenberg 2c28b18eb0 Add NestedPools: one per socket/cluster
Use in dot_test
app.h: add new flags and rename num_threads to max_threads
matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases
PiperOrigin-RevId: 683216386
2024-10-07 09:40:19 -07:00
Jan Wassenberg bd53b0f7c3 Fix MSAN issue for multiturn. Rewind the prior EOS token.
Also move MaybeCheckInitialized to allocator.h

PiperOrigin-RevId: 683187458
2024-10-07 08:07:54 -07:00
Jan Wassenberg 5a71d819cb Also enable f64 dot/sum for <f32 inputs
Add bf16 support to Dot/SumKernelDouble in the same way as *Compensated.

PiperOrigin-RevId: 682308683
2024-10-04 07:12:10 -07:00
Ray Smith 895ee4c6ce Moved Internal code around to simplify
PiperOrigin-RevId: 681877329
2024-10-03 07:55:21 -07:00
Krzysztof Ostrowski 12291e1ac0 Internal change.
PiperOrigin-RevId: 681583569
2024-10-02 14:03:34 -07:00
Krzysztof Ostrowski b3239bf509 Internal change.
PiperOrigin-RevId: 681530185
2024-10-02 11:33:06 -07:00
Jan Wassenberg 96d2ab7d31 Minor fix to profiler zone and add comment
PiperOrigin-RevId: 681350546
2024-10-02 01:37:50 -07:00
Daniel Keysers dc2e5f1505 PaliGemma: fix image loading.
Use uint8_t to make sure values are not interpreted as signed char.

PiperOrigin-RevId: 680965115
2024-10-01 04:54:04 -07:00
Jan Wassenberg 7d9fcda0d8 -467ms startup: parallel Reshape
Also split Softmax into Argmax helper, add comments;
add profiler zones + fix IDE warning

PiperOrigin-RevId: 680954573
2024-10-01 04:11:35 -07:00
Daniel Keysers d83ad76679 Rename one variable in SampleTopK and update TestSampleTopK.
PiperOrigin-RevId: 680897787
2024-10-01 00:51:33 -07:00
Jan Wassenberg 2d14d796e3 1.09x decode speedup for topk=1/temp0: fuse softmax and sample
PiperOrigin-RevId: 680589099
2024-09-30 08:37:41 -07:00
Jan Wassenberg 897f902d28 Fix include order, required to build with profiler enabled
PiperOrigin-RevId: 680574177
2024-09-30 07:52:50 -07:00
Jan Wassenberg 5e812f07f5 Use f64 Dot and sum in softmax - faster than Cascaded
Also let the kernel specify the Raw and State types,
rename WeightT/VecT -> WT/VT.

PiperOrigin-RevId: 680464427
2024-09-30 01:22:09 -07:00
Jan Wassenberg 47eb80a90e Add double-precision dot variant
PiperOrigin-RevId: 679243590
2024-09-26 12:09:10 -07:00
Daniel Keysers 71116daf64 Tiny update of the README formatting.
PiperOrigin-RevId: 679162673
2024-09-26 08:38:12 -07:00
Daniel Keysers 709143e9a6 Add download location of Pali Gemma weights to README.md.
PiperOrigin-RevId: 679127088
2024-09-26 06:38:11 -07:00
Jan Wassenberg 1bd64ec350 1.6x speedup of MatMulSlow using compensated Dot
PiperOrigin-RevId: 679063289
2024-09-26 02:42:53 -07:00
Daniel Keysers 606427022c Fix compiler errors when trying to generate (unused) code for the ConfigNoVit struct.
PiperOrigin-RevId: 679049377
2024-09-26 01:55:26 -07:00
Daniel Keysers 2290eb7d3f Reduce flakiness of dot_test.
PiperOrigin-RevId: 679049273
2024-09-26 01:54:27 -07:00
Copybara-Service e3507190ae Merge pull request #394 from ufownl:bugfix/prefix_lm
PiperOrigin-RevId: 678710685
2024-09-25 08:25:31 -07:00
RangerUFO d1010337c3 Fix prefix-LM mode assertion 2024-09-25 22:22:28 +08:00
Jan Wassenberg e70e686805 Add forward and backward error
PiperOrigin-RevId: 678297584
2024-09-24 10:10:29 -07:00
Daniel Keysers 673673cc98 Update expected entropy values for GRIFFIN_2B model.
These changed after introduction of "Cascaded summation for Softmax"

PiperOrigin-RevId: 678145851
2024-09-24 02:12:59 -07:00
Daniel Keysers f8835fe4a4 Add support for PaliGemma Vision-LM (224x224) to gemma.cpp
See https://arxiv.org/abs/2407.07726 for a description of the model.
Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that.

PiperOrigin-RevId: 677841119
2024-09-23 10:09:38 -07:00
Jan Wassenberg c6c10e0a53 Fix topology display for platforms where it fails (Apple)
PiperOrigin-RevId: 677800053
2024-09-23 08:14:54 -07:00
Jan Wassenberg cdbfebb10f Fix compress-inl bf16->f32 overrun
Caught by Arm hwasan but not x86 asan.

PiperOrigin-RevId: 677779421
2024-09-23 07:10:25 -07:00
Jan Wassenberg 35fdf848c7 Cascaded summation for Softmax
This can affect generation results after a few hundred tokens.

Also remove profiler from DecompressAndCall, use Add instead of +=,
use PackedSpan for args and remove alignment requirement.
Changing accumulation order in AssimilateCascadedSums updates dot_test thresholds.

PiperOrigin-RevId: 676891797
2024-09-20 10:31:23 -07:00
Copybara-Service 09bc8d62cc Merge pull request #380 from ufownl:bugfix/threading
PiperOrigin-RevId: 676799495
2024-09-20 04:52:48 -07:00
Jan Wassenberg bb6b398df3 Add pairwise sum dot products for testing
Also add wrapper function for threshold comparison.

PiperOrigin-RevId: 676749760
2024-09-20 01:48:52 -07:00
RangerUFO 62be3b98ce Fix the warnings complained by Clang 2024-09-19 13:57:24 +08:00
RangerUFO 42ab476a9a Fix the file name conflicts on case-insensitive systems 2024-09-19 13:54:35 +08:00
Daniel Keysers 03f0ee2323 Add tests for SampleTopK that highlight existing problems and fix those:
- Sampling was not correct for k>1 and temperature=0.
- Sampling was not correct for only negative logits.

Also restructure the code a bit for better readability and add some asserts for things that shouldn't happen.

PiperOrigin-RevId: 676043267
2024-09-18 10:32:01 -07:00
Daniel Keysers 760a69449e Add entropy expectations for Griffin-2b model in gemma_test and make sure it passes.
PiperOrigin-RevId: 675564389
2024-09-17 07:46:06 -07:00
Daniel Keysers e4ba93412a Add const batch accessor to RowVectorBatch.
PiperOrigin-RevId: 675530484
2024-09-17 05:42:14 -07:00
Daniel Keysers 892f3bbcbe Implement scalar version of LayerNorm
PiperOrigin-RevId: 675085495
2024-09-16 03:54:10 -07:00
Daniel Keysers 1c8ddcdffe Adds insert_float() to SbsWriter() to store a float array directly.
PiperOrigin-RevId: 673982528
2024-09-12 13:27:24 -07:00
Jan Wassenberg 13a9f76f64 Fix mismatch between blob_store and compress interfaces (bytes)
PiperOrigin-RevId: 673027268
2024-09-10 10:59:17 -07:00
Jan Wassenberg 8c0a8834c1 Major compression update, arbitrary-len unpack + new Dot
Compression:
* Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad
* New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test
* Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking
* NUQ: support arbitrary-length enc/dec
* New compression/shared, remove sfp.h and nuq.h
* Move Store2 into Traits and provide Compress2 wrapper
* Remove unused Decompress()-with-pool overload
* Simplify CompressedArrayLen, rename to CompressedArrayElements
* Remove unused DistortionStats b_l1_

Misc:
* Add compensated and Kahan dot, support any length
* Use same Dot function everywhere
* Move exact arithmetic functions into fp_arith
* use FloatPtr and MatPtr typedefs in tests; less stack usage
* Rename args to packed/raw
* Remove Traits::Name, instead TypeName<T>()
* Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream
PiperOrigin-RevId: 672868468
2024-09-10 02:22:19 -07:00
Jan Wassenberg 5c0da8c8c3 Minor cleanup/fixes:
- optimize_test simplify prompt check
- Fix SFP arg case
- Fix includes
- Align inputs in test
- IsInside: add DASSERT
- Fix PerClusterPool NumThreads

PiperOrigin-RevId: 672530385
2024-09-09 06:58:09 -07:00
Jan Wassenberg c29e9752c7 Refactor/cleanup, remove even_odd
* New compression/shared.h, remove sfp.h
* Remove unused DistortionStats b_l1_
* Move exact arithmetic functions into fp_arith
* Remove even_odd optimization for MatVec (mostly unused)
* use BF16 typedef more widely
* Add kMaxSFP constant

PiperOrigin-RevId: 670996386
2024-09-04 09:25:13 -07:00
Jan Wassenberg 07c34cb18a Further nuq_test speedups to prevent timeout
PiperOrigin-RevId: 670863385
2024-09-04 00:49:44 -07:00
Jan Wassenberg 9661b81c4b Fix NUQ for SVE - incorrect nibble packing
Also speed up test

PiperOrigin-RevId: 670625545
2024-09-03 10:59:01 -07:00
Jan Wassenberg aa11ddf5fc 1.22x NUQ compress speedup, fix out of bounds access, improve numerics
Also clarify the cost computation and move toward non-group-multiple num.

PiperOrigin-RevId: 670544245
2024-09-03 07:10:56 -07:00
Daniel Keysers 437e0eb9af Internal change. Slight restructuring of gemma_test.
PiperOrigin-RevId: 670529565
2024-09-03 06:16:09 -07:00
Daniel Keysers a8e08778d4 Add an additional QueryModel() overload to GemmaEnv.
Use args only in GemmaEnv constructor, store everything else in RuntimeConfig.
Add runtime option to turn off thread spinning.

PiperOrigin-RevId: 670467320
2024-09-03 02:25:19 -07:00
Zoltan Szabadka f6abbab3a4 Fix asan failure in local attention computation.
PiperOrigin-RevId: 670207380
2024-09-02 07:06:10 -07:00
Paul Chang 22d9476aad Demonstrate constrained decoding in gemma_cpp's hello world example
PiperOrigin-RevId: 669327521
2024-08-30 08:03:07 -07:00
Jan Wassenberg 4033ed9e78 Avoid duplication of RMSNorm, support all activation/weight types
Add test for RMSNorm
Rename VectorizedRopeAndMulBy -> RopeAndMulBy

Move test_util to util/

PiperOrigin-RevId: 668332927
2024-08-28 01:26:55 -07:00
Daniel Keysers 3c17911875 Make gemma_test slightly more allowing on MultiTurn.
PiperOrigin-RevId: 668097277
2024-08-27 12:40:16 -07:00