gemma.cpp

History

Andrey Vlasov 3e92088595 Remove allocation from GEMM_4x4_Tile when decoding compressed weights by implementing SfpCodec::Dec2F and ComressTraits<T>::Decompress2 for all supported types. It also allows to remove one of the specializations of GEMM_4x4_Tile, handling compressed MatB with one function. As before even when MatA is bf16 it is using 32-bit registers for computations. Measurements for a 2b-it sfp-encoded model on a AMD Ryzen Threadripper PRO 3945WX 12-Cores: baseline: ``` 32.6254 prefill tokens / sec 8.91429 tokens / sec 115 milliseconds time to first token ``` this change: ``` 54.3045 prefill tokens / sec 16.8191 tokens / sec 56 milliseconds time to first token ``` PiperOrigin-RevId: 651369694		2024-07-11 05:13:39 -07:00
..
python	Cleanup: move util/compress and convert_weights to compression/	2024-07-05 04:16:52 -07:00
BUILD	Move benchmark_helper to evals/, weights_raw to compression/.	2024-07-08 01:13:23 -07:00
analyze.h	Remove no longer required stats.h - use Highway version instead	2024-06-05 01:37:48 -07:00
blob_store.cc	Make BlobWriter::Add() accept const void*	2024-05-17 08:11:06 -07:00
blob_store.h	Make BlobWriter::Add() accept const void*	2024-05-17 08:11:06 -07:00
compress-inl.h	Remove allocation from GEMM_4x4_Tile when decoding compressed weights by implementing	2024-07-11 05:13:39 -07:00
compress.h	Add compression/ comments, especially on SFP range	2024-06-11 05:47:49 -07:00
compress_weights.cc	Refactor configurables.	2024-07-10 21:30:58 -07:00
convert_weights.py	Cleanup: move util/compress and convert_weights to compression/	2024-07-05 04:16:52 -07:00
distortion.h	Remove no longer required stats.h - use Highway version instead	2024-06-05 01:37:48 -07:00
distortion_test.cc	Update distortion.h to weighted average, add distortion_test.	2024-04-17 01:44:19 -07:00
io.cc	Further improve IO, enable multiple backends without -D.	2024-04-19 00:40:29 -07:00
io.h	Major duplicated code reduction in test/benchmarks	2024-06-14 00:16:25 -07:00
io_win.cc	Further improve IO, enable multiple backends without -D.	2024-04-19 00:40:29 -07:00
nuq-inl.h	revert back to HWY_ASSERT for lane constraints, qualify hn::Add	2024-06-04 10:10:18 -07:00
nuq.h	initial commit	2024-02-21 03:31:22 +00:00
nuq_test.cc	Remove no longer required stats.h - use Highway version instead	2024-06-05 01:37:48 -07:00
sfp-inl.h	Remove allocation from GEMM_4x4_Tile when decoding compressed weights by implementing	2024-07-11 05:13:39 -07:00
sfp.h	initial commit	2024-02-21 03:31:22 +00:00
sfp_test.cc	Remove no longer required stats.h - use Highway version instead	2024-06-05 01:37:48 -07:00
test_util.h	Remove no longer required stats.h - use Highway version instead	2024-06-05 01:37:48 -07:00
weights_raw.h	Refactor configurables.	2024-07-10 21:30:58 -07:00