gemma.cpp

History

Jan Wassenberg f9d93e4a42 Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning Remove empty matmul_unit_test. Up to 25 TFLOP/s on 2xZen4 for 512,3072,24576. PiperOrigin-RevId: 729123576		2025-02-20 08:33:46 -08:00
..
allocator.cc	Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning	2025-02-20 08:33:46 -08:00
allocator.h	Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning	2025-02-20 08:33:46 -08:00
app.h	Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning	2025-02-20 08:33:46 -08:00
args.h	Simpler MatMul interface, vocab types, Tristate for use_spinning	2024-11-04 07:48:29 -08:00
basics.h	Use vectorized TopK using highway VQSelect	2025-02-18 05:01:39 -08:00
test_util.h	Minor cleanup/fixes:	2024-09-09 06:58:09 -07:00
threading.cc	Allow overriding num threads despite detecting topology	2025-01-27 08:57:53 -08:00
threading.h	Only temporarily enable spinning in threading benchmark	2025-02-14 17:15:38 -08:00
threading_test.cc	Only temporarily enable spinning in threading benchmark	2025-02-14 17:15:38 -08:00