gemma.cpp/ops
Jan Wassenberg e890d46f30 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding
Only the weights; binding MatMul output worsens batch=1 prefill.
Update gemma_batch_bench to use --decode_qbatch.
Fix/remove prefill_activations in gemma-inl.h.

Refactor:
use BasePageBytes directly when binding
Move BindB/C to .cc by de-templatizing
Remove MatOwners::AllocateFor because it is weights-specific (binding or not)
Disband MatOwners, replace with vector
PiperOrigin-RevId: 759610477
2025-05-16 07:42:13 -07:00
..
bench_matmul.cc 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding 2025-05-16 07:42:13 -07:00
dot-inl.h Huge refactor of weight handling and model loading. 2025-05-06 04:44:21 -07:00
dot_test.cc Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
fp_arith-inl.h Cascaded summation for Softmax 2024-09-20 10:31:23 -07:00
gemma_matvec_test.cc Huge refactor of weight handling and model loading. 2025-05-06 04:44:21 -07:00
matmul-inl.h Replace last ConstMat with MatPtr 2025-05-13 10:55:22 -07:00
matmul.cc 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding 2025-05-16 07:42:13 -07:00
matmul.h 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding 2025-05-16 07:42:13 -07:00
matmul_test.cc Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
matvec-inl.h Replace last ConstMat with MatPtr 2025-05-13 10:55:22 -07:00
ops-inl.h Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
ops.h Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
ops_test.cc Cleanup: remove unused kCyclic, remove 2 suffix 2025-05-13 01:06:41 -07:00
sum-inl.h Minor cleanup, Windows+Bazel build fixes 2024-10-10 09:05:06 -07:00