gemma.cpp

History

Jan Wassenberg e890d46f30 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding Only the weights; binding MatMul output worsens batch=1 prefill. Update gemma_batch_bench to use --decode_qbatch. Fix/remove prefill_activations in gemma-inl.h. Refactor: use BasePageBytes directly when binding Move BindB/C to .cc by de-templatizing Remove MatOwners::AllocateFor because it is weights-specific (binding or not) Disband MatOwners, replace with vector PiperOrigin-RevId: 759610477		2025-05-16 07:42:13 -07:00
..
activations.h	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
backward-inl.h	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
backward.cc	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
backward.h	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
backward_scalar.h	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
backward_test.cc	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
common_scalar.h	Major refactor of allocator/args:	2025-04-10 01:29:54 -07:00
forward-inl.h	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
forward.cc	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
forward.h	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
forward_scalar.h	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
optimize_test.cc	Split W1/W2 as a load-time preprocess.	2025-05-13 07:39:59 -07:00
optimizer.cc	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
optimizer.h	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
prompt.h	Add missing include	2024-06-04 10:29:12 +00:00
sampler.h	Add config for att/final cap, skip max-subtract. Fixes #278	2024-07-01 09:45:26 -07:00
test_util.h	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding	2025-05-16 07:42:13 -07:00