gemma.cpp

History

Jan Wassenberg e890d46f30 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding Only the weights; binding MatMul output worsens batch=1 prefill. Update gemma_batch_bench to use --decode_qbatch. Fix/remove prefill_activations in gemma-inl.h. Refactor: use BasePageBytes directly when binding Move BindB/C to .cc by de-templatizing Remove MatOwners::AllocateFor because it is weights-specific (binding or not) Disband MatOwners, replace with vector PiperOrigin-RevId: 759610477		2025-05-16 07:42:13 -07:00
..
bindings	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
evals	Add MMLU eval to github	2024-05-20 10:20:53 -07:00
instantiations	Eliminated TConfig.	2024-10-17 05:04:22 -07:00
activations.h	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding	2025-05-16 07:42:13 -07:00
common.cc	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
common.h	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
configs.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
configs.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
configs_test.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
gemma-inl.h	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding	2025-05-16 07:42:13 -07:00
gemma.cc	3.8x speedup of weights loading via preadv on Linux	2025-05-15 11:55:15 -07:00
gemma.h	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
gemma_args.h	Cleanup: remove unused kCyclic, remove 2 suffix	2025-05-13 01:06:41 -07:00
kv_cache.cc	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
kv_cache.h	Cleanup: remove unused kCyclic, remove 2 suffix	2025-05-13 01:06:41 -07:00
model_store.cc	Fix the wrapping field of the deduced model config	2025-05-13 23:02:03 +08:00
model_store.h	Move fields, io* and blob* from compression/ into io/	2025-05-06 11:17:19 -07:00
run.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
tensor_info.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
tensor_info.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
tensor_info_test.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
tokenizer.cc	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
tokenizer.h	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
weights.cc	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding	2025-05-16 07:42:13 -07:00
weights.h	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding	2025-05-16 07:42:13 -07:00