gemma.cpp/ops
Jan Wassenberg 31c09cca4c f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX
Add a special case for A=F32,B=BF16, used when there is no native bf16 dot product.

dot-inl: ensure bf16,f32 and f32,bf16 both get promoted to float before f64 summation
matmul.cc: update autotuning to reflect actual A size
matmul_test: add all combinations of bf16/f32, report all results, not just first difference, check non-vector-aligned K
PiperOrigin-RevId: 800487817
2025-08-28 08:55:50 -07:00
..
bench_matmul.cc De-singleton ThreadingContext so callers can pass in their own 2025-07-22 02:08:46 -07:00
dot-inl.h f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX 2025-08-28 08:55:50 -07:00
dot_test.cc De-singleton ThreadingContext so callers can pass in their own 2025-07-22 02:08:46 -07:00
fp_arith-inl.h Decouple MatMul from gemma-inl: precompile for all input types 2025-05-27 07:08:58 -07:00
gemma_matvec_test.cc De-singleton ThreadingContext so callers can pass in their own 2025-07-22 02:08:46 -07:00
matmul-inl.h f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX 2025-08-28 08:55:50 -07:00
matmul.cc f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX 2025-08-28 08:55:50 -07:00
matmul.h f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX 2025-08-28 08:55:50 -07:00
matmul_static-inl.h 1.16x decode speedup: remove last MatVec in Attention 2025-06-02 09:40:29 -07:00
matmul_static.h 1.16x decode speedup: remove last MatVec in Attention 2025-06-02 09:40:29 -07:00
matmul_static_bf16.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
matmul_static_f32.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
matmul_static_nuq.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
matmul_static_sfp.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
matmul_test.cc f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX 2025-08-28 08:55:50 -07:00
matvec-inl.h Replace last ConstMat with MatPtr 2025-05-13 10:55:22 -07:00
ops-inl.h Minor: batched NotifyGenerate, fix comment/dep 2025-08-26 23:33:17 -07:00
ops.h De-singleton ThreadingContext so callers can pass in their own 2025-07-22 02:08:46 -07:00
ops_test.cc (Resubmit) Prepare profiler annotations for new API 2025-08-13 01:38:24 -07:00
sum-inl.h Minor cleanup, Windows+Bazel build fixes 2024-10-10 09:05:06 -07:00