gemma.cpp/ops
Jan Wassenberg 2ebbe4076f 1.03-1.08x decode speedup: precompute Rope theta, fuse
Split attention into functions, move into class.
Fuse Rope and MulBy, allow non-in-place version to avoid copy from q to KV.
Sink if() into MaybeLogitsSoftCap.

PiperOrigin-RevId: 661168418
2024-08-09 01:23:24 -07:00
..
matmul-inl.h Split matmul into matvec; add large matrix benchmark 2024-07-30 08:29:11 -07:00
matmul_test.cc Split matmul into matvec; add large matrix benchmark 2024-07-30 08:29:11 -07:00
matvec-inl.h Split matmul into matvec; add large matrix benchmark 2024-07-30 08:29:11 -07:00
matvec_test.cc Split matmul into matvec; add large matrix benchmark 2024-07-30 08:29:11 -07:00
ops-inl.h 1.03-1.08x decode speedup: precompute Rope theta, fuse 2024-08-09 01:23:24 -07:00
ops_test.cc Split up ops.h into ops/ops-inl and matmul-inl 2024-07-19 11:21:48 -07:00