mirror of https://github.com/google/gemma.cpp.git
SfpCodec::Dec2F and ComressTraits<T>::Decompress2 for all supported types. It also allows to remove one of the specializations of GEMM_4x4_Tile, handling compressed MatB with one function. As before even when MatA is bf16 it is using 32-bit registers for computations. Measurements for a 2b-it sfp-encoded model on a AMD Ryzen Threadripper PRO 3945WX 12-Cores: baseline: ``` 32.6254 prefill tokens / sec 8.91429 tokens / sec 115 milliseconds time to first token ``` this change: ``` 54.3045 prefill tokens / sec 16.8191 tokens / sec 56 milliseconds time to first token ``` PiperOrigin-RevId: 651369694 |
||
|---|---|---|
| .. | ||
| python | ||
| BUILD | ||
| analyze.h | ||
| blob_store.cc | ||
| blob_store.h | ||
| compress-inl.h | ||
| compress.h | ||
| compress_weights.cc | ||
| convert_weights.py | ||
| distortion.h | ||
| distortion_test.cc | ||
| io.cc | ||
| io.h | ||
| io_win.cc | ||
| nuq-inl.h | ||
| nuq.h | ||
| nuq_test.cc | ||
| sfp-inl.h | ||
| sfp.h | ||
| sfp_test.cc | ||
| test_util.h | ||
| weights_raw.h | ||