mirror of https://github.com/google/gemma.cpp.git
SfpCodec::Dec2F and ComressTraits<T>::Decompress2 for all supported types. It also allows to remove one of the specializations of GEMM_4x4_Tile, handling compressed MatB with one function. As before even when MatA is bf16 it is using 32-bit registers for computations. Measurements for a 2b-it sfp-encoded model on a AMD Ryzen Threadripper PRO 3945WX 12-Cores: baseline: ``` 32.6254 prefill tokens / sec 8.91429 tokens / sec 115 milliseconds time to first token ``` this change: ``` 54.3045 prefill tokens / sec 16.8191 tokens / sec 56 milliseconds time to first token ``` PiperOrigin-RevId: 651369694 |
||
|---|---|---|
| .. | ||
| evals | ||
| instantiations | ||
| activations.h | ||
| common.cc | ||
| common.h | ||
| configs.h | ||
| gemma-inl.h | ||
| gemma.cc | ||
| gemma.h | ||
| kv_cache.cc | ||
| kv_cache.h | ||
| ops.h | ||
| ops_test.cc | ||
| run.cc | ||
| tokenizer.cc | ||
| tokenizer.h | ||
| weights.cc | ||
| weights.h | ||