gemma.cpp

History

Jan Wassenberg 31c09cca4c f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX Add a special case for A=F32,B=BF16, used when there is no native bf16 dot product. dot-inl: ensure bf16,f32 and f32,bf16 both get promoted to float before f64 summation matmul.cc: update autotuning to reflect actual A size matmul_test: add all combinations of bf16/f32, report all results, not just first difference, check non-vector-aligned K PiperOrigin-RevId: 800487817		2025-08-28 08:55:50 -07:00
..
python	Automated Code Change	2025-08-01 13:29:50 -07:00
BUILD.bazel	Automated Code Change	2025-08-01 13:29:50 -07:00
analyze.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
compress-inl.h	1.02x batch decode speedup: BF16 KV cache	2025-06-17 23:21:59 -07:00
compress.cc	Minor cleanup, on-demand NUQ buffer allocation	2025-04-16 10:49:43 -07:00
compress.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
compress_test.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
distortion.h	Refactor/cleanup, remove even_odd	2024-09-04 09:25:13 -07:00
distortion_test.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
nuq-inl.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
nuq_test.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
sfp-inl.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
sfp_test.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
test_util-inl.h	f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX	2025-08-28 08:55:50 -07:00
types.h	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00