gemma.cpp/ops
Jan Wassenberg f9d93e4a42 Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning
Remove empty matmul_unit_test.
Up to 25 TFLOP/s on 2xZen4 for 512,3072,24576.

PiperOrigin-RevId: 729123576
2025-02-20 08:33:46 -08:00
..
bench_matmul.cc Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
dot-inl.h Infra improvements (2) 2025-01-23 01:55:19 -08:00
dot_test.cc Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
fp_arith-inl.h Cascaded summation for Softmax 2024-09-20 10:31:23 -07:00
gemma_matvec_test.cc Eliminated TConfig. 2024-10-17 05:04:22 -07:00
matmul-inl.h Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
matmul.cc Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
matmul.h Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
matmul_test.cc Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
matvec-inl.h Infra improvements (2) 2025-01-23 01:55:19 -08:00
ops-inl.h Use vectorized TopK using highway VQSelect 2025-02-18 05:01:39 -08:00
ops.h Infra improvements (2) 2025-01-23 01:55:19 -08:00
ops_test.cc Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
sum-inl.h Minor cleanup, Windows+Bazel build fixes 2024-10-10 09:05:06 -07:00