gemma.cpp

History

Krzysztof Rymski f56d18dd68 Improvements to inference using int8 compressed kv's Multiplication is done using int16*int16 multiplication instructions avoid expensive conversion to f32/bf16 x2 speed on zen3 PiperOrigin-RevId: 888690192		2026-03-24 08:51:30 -07:00
..
bench_matmul.cc	Internal change / remove unused PrintSpeed	2026-01-08 05:26:31 -08:00
dot-inl.h	BF16 mixed-mode flash attention	2025-10-29 01:48:28 -07:00
dot_test.cc	Add ToFloatSlow, move RandomFloat to test_util	2025-11-27 00:14:51 -08:00
fast_ops-inl.h	Use Lookup8 and detail::IsFull(d) in FastSigmoid	2026-03-24 06:36:55 -07:00
fp_arith-inl.h	Decouple MatMul from gemma-inl: precompile for all input types	2025-05-27 07:08:58 -07:00
matmul-inl.h	1.01x speedup: improved autotune	2025-10-27 05:35:31 -07:00
matmul.cc	Fix excessive KC/MC from prior change	2025-10-28 05:33:01 -07:00
matmul.h	Minor: ParallelismStrategy->Parallelism	2025-11-06 06:56:10 -08:00
matmul_static-inl.h	1.03x speedup: fused FFN	2025-09-15 10:26:37 -07:00
matmul_static.h	Add 8-bit integer quantization (I8Stream) to Gemma.cpp.	2025-10-15 09:25:20 -07:00
matmul_static_bf16.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_static_f32.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_static_i8.cc	Add 8-bit integer quantization (I8Stream) to Gemma.cpp.	2025-10-15 09:25:20 -07:00
matmul_static_nuq.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_static_sfp.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_test.cc	Internal change / remove unused PrintSpeed	2026-01-08 05:26:31 -08:00
ops-inl.h	Improvements to inference using int8 compressed kv's	2026-03-24 08:51:30 -07:00
ops.h	Added access to softmax attention internals to regular attention	2025-11-21 09:01:01 -08:00
ops_test.cc	Implement FastSigmoid.	2026-03-04 06:12:33 -08:00
sum-inl.h	Minor cleanup, Windows+Bazel build fixes	2024-10-10 09:05:06 -07:00