gemma.cpp/util
Jan Wassenberg f9d93e4a42 Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning
Remove empty matmul_unit_test.
Up to 25 TFLOP/s on 2xZen4 for 512,3072,24576.

PiperOrigin-RevId: 729123576
2025-02-20 08:33:46 -08:00
..
allocator.cc Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
allocator.h Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
app.h Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
args.h Simpler MatMul interface, vocab types, Tristate for use_spinning 2024-11-04 07:48:29 -08:00
basics.h Use vectorized TopK using highway VQSelect 2025-02-18 05:01:39 -08:00
test_util.h Minor cleanup/fixes: 2024-09-09 06:58:09 -07:00
threading.cc Allow overriding num threads despite detecting topology 2025-01-27 08:57:53 -08:00
threading.h Only temporarily enable spinning in threading benchmark 2025-02-14 17:15:38 -08:00
threading_test.cc Only temporarily enable spinning in threading benchmark 2025-02-14 17:15:38 -08:00