gemma.cpp/compression
Jan Wassenberg 31c09cca4c f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX
Add a special case for A=F32,B=BF16, used when there is no native bf16 dot product.

dot-inl: ensure bf16,f32 and f32,bf16 both get promoted to float before f64 summation
matmul.cc: update autotuning to reflect actual A size
matmul_test: add all combinations of bf16/f32, report all results, not just first difference, check non-vector-aligned K
PiperOrigin-RevId: 800487817
2025-08-28 08:55:50 -07:00
..
python Automated Code Change 2025-08-01 13:29:50 -07:00
BUILD.bazel Automated Code Change 2025-08-01 13:29:50 -07:00
analyze.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
compress-inl.h 1.02x batch decode speedup: BF16 KV cache 2025-06-17 23:21:59 -07:00
compress.cc Minor cleanup, on-demand NUQ buffer allocation 2025-04-16 10:49:43 -07:00
compress.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
compress_test.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
distortion.h Refactor/cleanup, remove even_odd 2024-09-04 09:25:13 -07:00
distortion_test.cc Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
nuq-inl.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
nuq_test.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
sfp-inl.h Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
sfp_test.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
test_util-inl.h f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX 2025-08-28 08:55:50 -07:00
types.h Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00