Commit Graph

15 Commits

Author SHA1 Message Date
Jan Wassenberg 2038dfd9cc Minor: rename compression/shared -> types.h
PiperOrigin-RevId: 758199851
2025-05-13 06:53:21 -07:00
Jan Wassenberg 8c0a8834c1 Major compression update, arbitrary-len unpack + new Dot
Compression:
* Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad
* New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test
* Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking
* NUQ: support arbitrary-length enc/dec
* New compression/shared, remove sfp.h and nuq.h
* Move Store2 into Traits and provide Compress2 wrapper
* Remove unused Decompress()-with-pool overload
* Simplify CompressedArrayLen, rename to CompressedArrayElements
* Remove unused DistortionStats b_l1_

Misc:
* Add compensated and Kahan dot, support any length
* Use same Dot function everywhere
* Move exact arithmetic functions into fp_arith
* use FloatPtr and MatPtr typedefs in tests; less stack usage
* Rename args to packed/raw
* Remove Traits::Name, instead TypeName<T>()
* Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream
PiperOrigin-RevId: 672868468
2024-09-10 02:22:19 -07:00
Jan Wassenberg c29e9752c7 Refactor/cleanup, remove even_odd
* New compression/shared.h, remove sfp.h
* Remove unused DistortionStats b_l1_
* Move exact arithmetic functions into fp_arith
* Remove even_odd optimization for MatVec (mostly unused)
* use BF16 typedef more widely
* Add kMaxSFP constant

PiperOrigin-RevId: 670996386
2024-09-04 09:25:13 -07:00
Jan Wassenberg 4033ed9e78 Avoid duplication of RMSNorm, support all activation/weight types
Add test for RMSNorm
Rename VectorizedRopeAndMulBy -> RopeAndMulBy

Move test_util to util/

PiperOrigin-RevId: 668332927
2024-08-28 01:26:55 -07:00
Jan Wassenberg 2308514e5a Experiment with compensated dot product.
ULP difference vs exact is 0..1, vs 200-5000 for previous.
Runtime overhead is 2.5-4x for f32 input.

PiperOrigin-RevId: 668084019
2024-08-27 12:05:35 -07:00
Jan Wassenberg 1617e1a33d SFP speedup: 1.14x f32, 1.19x bf16 dot = 1.02x prefill
12->9 ops by recognizing the upper/lower bytes are simply shifted.

PiperOrigin-RevId: 659609241
2024-08-05 10:59:13 -07:00
Jan Wassenberg 5c3e5f7038 Remove no longer required stats.h - use Highway version instead
PiperOrigin-RevId: 640440379
2024-06-05 01:37:48 -07:00
Jan Wassenberg c5c9fc300c Enable even/odd for SFP. Refs #166
Disable it for float32 because there is not enough benefit.

PiperOrigin-RevId: 631788326
2024-05-08 07:09:06 -07:00
Jan Wassenberg b5a9ade75f 2x speedup of SFP decode (1.4x overall) on AVX3_DL+.
Thanks @nzmichaelh for suggesting table lookups!

PiperOrigin-RevId: 631337524
2024-05-07 01:46:43 -07:00
Jan Wassenberg a939b5fc9f Update distortion.h to weighted average, add distortion_test.
More thorough checks in sfp_test and nuq_test.
nuq_test: use deterministic input generator.

PiperOrigin-RevId: 625602019
2024-04-17 01:44:19 -07:00
Jan Wassenberg a982ec1287 Move code to gemma/ so we can remove error-prone copybara: comments.
Also fix includes and Lint warnings.

PiperOrigin-RevId: 623127487
2024-04-09 04:45:42 -07:00
Jan Wassenberg 61e031fe98 Towards building tests without GUnit Refs #29
PiperOrigin-RevId: 618032987
2024-03-21 19:28:02 -07:00
Jan Wassenberg 24add61dd9 Fix SFP/NUQ for bf16 rounding in Highway
SFP: Avoid rounding twice, and more robust TestDot.
NUQ: also more robust SNR, minor touchups to header.

PiperOrigin-RevId: 618030096
2024-03-21 19:06:19 -07:00
Jan Wassenberg bb9b023502 Support Bazel builds. Fixes #16
Also fix nuq/sfp-inl: warning, cast, and disable SCALAR

PiperOrigin-RevId: 612704056
2024-03-04 22:07:25 -08:00
Austin Huang e29cd566cf initial commit 2024-02-21 03:31:22 +00:00