gemma.cpp/backprop
Jan Wassenberg e890d46f30 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding
Only the weights; binding MatMul output worsens batch=1 prefill.
Update gemma_batch_bench to use --decode_qbatch.
Fix/remove prefill_activations in gemma-inl.h.

Refactor:
use BasePageBytes directly when binding
Move BindB/C to .cc by de-templatizing
Remove MatOwners::AllocateFor because it is weights-specific (binding or not)
Disband MatOwners, replace with vector
PiperOrigin-RevId: 759610477
2025-05-16 07:42:13 -07:00
..
activations.h Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
backward-inl.h Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
backward.cc Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
backward.h Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
backward_scalar.h Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
backward_test.cc Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
common_scalar.h Major refactor of allocator/args: 2025-04-10 01:29:54 -07:00
forward-inl.h Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
forward.cc Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
forward.h Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
forward_scalar.h Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
optimize_test.cc Split W1/W2 as a load-time preprocess. 2025-05-13 07:39:59 -07:00
optimizer.cc Huge refactor of weight handling and model loading. 2025-05-06 04:44:21 -07:00
optimizer.h Huge refactor of weight handling and model loading. 2025-05-06 04:44:21 -07:00
prompt.h Add missing include 2024-06-04 10:29:12 +00:00
sampler.h Add config for att/final cap, skip max-subtract. Fixes #278 2024-07-01 09:45:26 -07:00
test_util.h 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding 2025-05-16 07:42:13 -07:00