gemma.cpp/compression
Jan Wassenberg cf4d7ceb82 1.16x decode speedup: remove last MatVec in Attention
Precompute row pointers.
Remove no longer used MHA support; QStride -> qkv_dim.
Remove RowPtr from MatMul interface, use only MatPtrT.
Require opt-in define for NUQ to speed up builds.
Also fix io.cc on Windows.

PiperOrigin-RevId: 766228108
2025-06-02 09:40:29 -07:00
..
python 1.16x decode speedup: remove last MatVec in Attention 2025-06-02 09:40:29 -07:00
BUILD.bazel Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
analyze.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
compress-inl.h Remove backprop/ 2025-05-28 07:01:17 -07:00
compress.cc Minor cleanup, on-demand NUQ buffer allocation 2025-04-16 10:49:43 -07:00
compress.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
compress_test.cc Huge refactor of weight handling and model loading. 2025-05-06 04:44:21 -07:00
distortion.h Refactor/cleanup, remove even_odd 2024-09-04 09:25:13 -07:00
distortion_test.cc Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
nuq-inl.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
nuq_test.cc 1.16x decode speedup: remove last MatVec in Attention 2025-06-02 09:40:29 -07:00
sfp-inl.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
sfp_test.cc Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
test_util-inl.h 1.16x decode speedup: remove last MatVec in Attention 2025-06-02 09:40:29 -07:00
types.h 1.16x decode speedup: remove last MatVec in Attention 2025-06-02 09:40:29 -07:00