gemma.cpp

History

Jan Wassenberg 301dc8067a Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul Supports converting all weight/activation formats to native MulT (bf16/f32) Also: - ConstMat/MutableMat for const correctness - Move RowVectorBatch to allocator.h so it can be used from Matmul - Add matmul.h so MatMulEnv can be used from Activations - Remove kMaxThreads, detect from PerClusterPools - Build fix: -inl.h files must be textual_hdrs, and highway.h should precede -inl.h ``` zen4 new 64, 24576, 3072, add=0, MatTA=bf16, MatTB=sfp: 616.6 GFLOPS. 64, 3072, 24576, add=0, MatTA=bf16, MatTB=sfp: 460.7 GFLOPS. 64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 598.6 GFLOPS. 64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 435.6 GFLOPS. zen4 old 64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 257.5 GFLOPS. 64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 231.9 GFLOPS. ``` PiperOrigin-RevId: 663729812		2024-08-16 07:52:20 -07:00
..
python	Add Python code for converting Griffin Orbax weights. Refs #301	2024-07-29 12:53:30 -07:00
BUILD	Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul	2024-08-16 07:52:20 -07:00
analyze.h	Remove no longer required stats.h - use Highway version instead	2024-06-05 01:37:48 -07:00
blob_store.cc	Make BlobWriter::Add() accept const void*	2024-05-17 08:11:06 -07:00
blob_store.h	Make BlobWriter::Add() accept const void*	2024-05-17 08:11:06 -07:00
compress-inl.h	Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul	2024-08-16 07:52:20 -07:00
compress.h	0.98x prefill: refactor in prep for cache blocking.	2024-08-12 09:26:29 -07:00
compress_weights.cc	Refactor configurables.	2024-07-10 21:30:58 -07:00
convert_weights.py	Cleanup: move util/compress and convert_weights to compression/	2024-07-05 04:16:52 -07:00
distortion.h	Remove no longer required stats.h - use Highway version instead	2024-06-05 01:37:48 -07:00
distortion_test.cc	Update distortion.h to weighted average, add distortion_test.	2024-04-17 01:44:19 -07:00
io.cc	Further improve IO, enable multiple backends without -D.	2024-04-19 00:40:29 -07:00
io.h	Major duplicated code reduction in test/benchmarks	2024-06-14 00:16:25 -07:00
io_win.cc	Further improve IO, enable multiple backends without -D.	2024-04-19 00:40:29 -07:00
nuq-inl.h	revert back to HWY_ASSERT for lane constraints, qualify hn::Add	2024-06-04 10:10:18 -07:00
nuq.h	initial commit	2024-02-21 03:31:22 +00:00
nuq_test.cc	Remove no longer required stats.h - use Highway version instead	2024-06-05 01:37:48 -07:00
sfp-inl.h	Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul	2024-08-16 07:52:20 -07:00
sfp.h	initial commit	2024-02-21 03:31:22 +00:00
sfp_test.cc	SFP speedup: 1.14x f32, 1.19x bf16 dot = 1.02x prefill	2024-08-05 10:59:13 -07:00
test_util.h	Remove no longer required stats.h - use Highway version instead	2024-06-05 01:37:48 -07:00
weights_raw.h	Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul	2024-08-16 07:52:20 -07:00