gemma.cpp/compression
Jan Wassenberg 794a21a4e6 Major refactor to de-templatize gemma-inl and weights
This replaces per-weight instantiations of all code with only per-MatMul/norm.
Reduces binary size by 133KiB.

WeightsOwner is no longer required for type erasing, hence it is replaced with ModelWeightsPtrs.
Also remove unused EmbedToken, replaced with EmbedMMToken.

PiperOrigin-RevId: 766497657
2025-06-02 23:01:35 -07:00
..
python 1.16x decode speedup: remove last MatVec in Attention 2025-06-02 09:40:29 -07:00
BUILD.bazel Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
analyze.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
compress-inl.h Major refactor to de-templatize gemma-inl and weights 2025-06-02 23:01:35 -07:00
compress.cc Minor cleanup, on-demand NUQ buffer allocation 2025-04-16 10:49:43 -07:00
compress.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
compress_test.cc Huge refactor of weight handling and model loading. 2025-05-06 04:44:21 -07:00
distortion.h Refactor/cleanup, remove even_odd 2024-09-04 09:25:13 -07:00
distortion_test.cc Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
nuq-inl.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
nuq_test.cc 1.16x decode speedup: remove last MatVec in Attention 2025-06-02 09:40:29 -07:00
sfp-inl.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
sfp_test.cc Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
test_util-inl.h 1.16x decode speedup: remove last MatVec in Attention 2025-06-02 09:40:29 -07:00
types.h 1.16x decode speedup: remove last MatVec in Attention 2025-06-02 09:40:29 -07:00