gemma.cpp/backprop
Jan Wassenberg f9d93e4a42 Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning
Remove empty matmul_unit_test.
Up to 25 TFLOP/s on 2xZen4 for 512,3072,24576.

PiperOrigin-RevId: 729123576
2025-02-20 08:33:46 -08:00
..
activations.h Eliminated TConfig. 2024-10-17 05:04:22 -07:00
backward-inl.h Eliminated TConfig. 2024-10-17 05:04:22 -07:00
backward.cc Eliminated TConfig. 2024-10-17 05:04:22 -07:00
backward.h Eliminated TConfig. 2024-10-17 05:04:22 -07:00
backward_scalar.h Eliminated TConfig. 2024-10-17 05:04:22 -07:00
backward_scalar_test.cc Added ability to load/save a complete model file, including tokenizer. 2024-12-19 07:59:41 -08:00
backward_test.cc Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning 2025-02-20 08:33:46 -08:00
common_scalar.h Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray. 2024-10-10 08:22:30 -07:00
forward-inl.h Eliminated TConfig. 2024-10-17 05:04:22 -07:00
forward.cc Eliminated TConfig. 2024-10-17 05:04:22 -07:00
forward.h Eliminated TConfig. 2024-10-17 05:04:22 -07:00
forward_scalar.h Eliminated TConfig. 2024-10-17 05:04:22 -07:00
optimize_test.cc Infra improvements (2) 2025-01-23 01:55:19 -08:00
optimizer.cc Eliminated TConfig. 2024-10-17 05:04:22 -07:00
optimizer.h Eliminated TConfig. 2024-10-17 05:04:22 -07:00
prompt.h Add missing include 2024-06-04 10:29:12 +00:00
sampler.h Add config for att/final cap, skip max-subtract. Fixes #278 2024-07-01 09:45:26 -07:00
test_util.h Removed duplicated tensor sizes from weights.h by changing the constructor used for MatPtrT 2024-12-11 06:30:28 -08:00