gemma.cpp/ops
Jan Wassenberg 8e028632f7 0.98x prefill: refactor in prep for cache blocking.
Slower because we now init tiles of C and accumulate into them.

Also remove unused var in optimize_test and use BF16 typedef.

PiperOrigin-RevId: 662115916
2024-08-12 09:26:29 -07:00
..
matmul-inl.h 0.98x prefill: refactor in prep for cache blocking. 2024-08-12 09:26:29 -07:00
matmul_test.cc 0.98x prefill: refactor in prep for cache blocking. 2024-08-12 09:26:29 -07:00
matvec-inl.h Split matmul into matvec; add large matrix benchmark 2024-07-30 08:29:11 -07:00
matvec_test.cc Split matmul into matvec; add large matrix benchmark 2024-07-30 08:29:11 -07:00
ops-inl.h 1.03-1.08x decode speedup: precompute Rope theta, fuse 2024-08-09 01:23:24 -07:00
ops_test.cc Split up ops.h into ops/ops-inl and matmul-inl 2024-07-19 11:21:48 -07:00