gemma.cpp/ops
Apoorv Reddy c6eb3b6f0d VectorizedRopeAndMulBy.
~8x reduction (tested on few prompts) in Rope.
~3.8% prefill latency improvement.
~2.6% decode latency improvement.

PiperOrigin-RevId: 664650108
2024-08-18 23:17:01 -07:00
..
gemma_matvec_test.cc Fix build issues when tests are enabled 2024-08-12 18:50:23 +02:00
matmul-inl.h Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul 2024-08-16 07:52:20 -07:00
matmul.h Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul 2024-08-16 07:52:20 -07:00
matmul_test.cc Major MatMul update, 1.9-2.3x speedup on Zen4 via bf16 mul 2024-08-16 07:52:20 -07:00
matvec-inl.h Split matmul into matvec; add large matrix benchmark 2024-07-30 08:29:11 -07:00
ops-inl.h VectorizedRopeAndMulBy. 2024-08-18 23:17:01 -07:00
ops_test.cc VectorizedRopeAndMulBy. 2024-08-18 23:17:01 -07:00