Commit Graph

7 Commits

Author SHA1 Message Date
Jan Wassenberg 229bd078a1 1.29x speedup: bf16 C1/C2. Extend most ops to any type, expand test coverage.
Also increase dot_test.cc range for Zen4, and matmul_test tolerance (failing in some configs)

PiperOrigin-RevId: 801789922
2025-09-01 06:34:04 -07:00
Jan Wassenberg cd80d8b24d Speed up builds by skipping rarely used targets
Centralize previous code into GEMMA_DISABLED_TARGETS

PiperOrigin-RevId: 772433723
2025-06-17 05:44:20 -07:00
Jan Wassenberg 8d0882b966 Huge refactor of weight handling and model loading.
Weight handling:
- new ModelStore2 supports both pre-2025 multi-file and single-file formats
- simpler ForEachTensor with TensorArgs
- tensors are constructed with their full suffixed name

I/O:
- support mmap and stride
- Simplified SbsWriter, single insert(); add SbsReader

Misc:
- kMockTokenizer: allow creating with unavailable tokenizer
- configs.h: Simpler enum validity checks via kSentinel
- matmul.h: remove unused enable_bind (now in allocator.h)
- tensor_info: single TensorInfoRegistry class, rename from tensor_index.h

Frontends:
- Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&)
- Deduce model/weight type, remove --model and parsing
- Replace most common.h includes with configs.h
- Remove --compressed_weights, use --weights instead
- Remove ModelInfo, replaced by ModelConfig.

Backprop:
- Reduce max loss, remove backward_scalar_test (timeout)
- Update thresholds because new RandInit changes rng eval order and thus numerics
PiperOrigin-RevId: 755317484
2025-05-06 04:44:21 -07:00
Phil Culliton 7ccc6abe87 Allow conversion, loading and inference with NUQ.
PiperOrigin-RevId: 723507890
2025-02-05 07:45:54 -08:00
Phil Culliton 8a6edff319 Base interleaved handling for 4.5-bit NUQ, specifically Enc, DecompressAndZeroPad, and Dec2. Includes tests.
PiperOrigin-RevId: 721821577
2025-01-31 10:35:32 -08:00
Jan Wassenberg 8c0a8834c1 Major compression update, arbitrary-len unpack + new Dot
Compression:
* Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad
* New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test
* Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking
* NUQ: support arbitrary-length enc/dec
* New compression/shared, remove sfp.h and nuq.h
* Move Store2 into Traits and provide Compress2 wrapper
* Remove unused Decompress()-with-pool overload
* Simplify CompressedArrayLen, rename to CompressedArrayElements
* Remove unused DistortionStats b_l1_

Misc:
* Add compensated and Kahan dot, support any length
* Use same Dot function everywhere
* Move exact arithmetic functions into fp_arith
* use FloatPtr and MatPtr typedefs in tests; less stack usage
* Rename args to packed/raw
* Remove Traits::Name, instead TypeName<T>()
* Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream
PiperOrigin-RevId: 672868468
2024-09-10 02:22:19 -07:00
Jan Wassenberg c29e9752c7 Refactor/cleanup, remove even_odd
* New compression/shared.h, remove sfp.h
* Remove unused DistortionStats b_l1_
* Move exact arithmetic functions into fp_arith
* Remove even_odd optimization for MatVec (mostly unused)
* use BF16 typedef more widely
* Add kMaxSFP constant

PiperOrigin-RevId: 670996386
2024-09-04 09:25:13 -07:00