gemma.cpp/ops
Jan Wassenberg 2308514e5a Experiment with compensated dot product.
ULP difference vs exact is 0..1, vs 200-5000 for previous.
Runtime overhead is 2.5-4x for f32 input.

PiperOrigin-RevId: 668084019
2024-08-27 12:05:35 -07:00
..
dot-inl.h Experiment with compensated dot product. 2024-08-27 12:05:35 -07:00
dot_test.cc Experiment with compensated dot product. 2024-08-27 12:05:35 -07:00
gemma_matvec_test.cc Fix build issues when tests are enabled 2024-08-12 18:50:23 +02:00
matmul-inl.h Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul 2024-08-16 07:52:20 -07:00
matmul.h Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul 2024-08-16 07:52:20 -07:00
matmul_test.cc Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul 2024-08-16 07:52:20 -07:00
matvec-inl.h Experiment with compensated dot product. 2024-08-27 12:05:35 -07:00
ops-inl.h Minor followup: remainder handling is a single iteration 2024-08-27 01:19:44 -07:00
ops_test.cc VectorizedRopeAndMulBy. 2024-08-18 23:17:01 -07:00