gemma.cpp/evals
Jan Wassenberg e890d46f30 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding
Only the weights; binding MatMul output worsens batch=1 prefill.
Update gemma_batch_bench to use --decode_qbatch.
Fix/remove prefill_activations in gemma-inl.h.

Refactor:
use BasePageBytes directly when binding
Move BindB/C to .cc by de-templatizing
Remove MatOwners::AllocateFor because it is weights-specific (binding or not)
Disband MatOwners, replace with vector
PiperOrigin-RevId: 759610477
2025-05-16 07:42:13 -07:00
..
benchmark.cc Replace RowVectorBatch with MatStorageT 2025-05-12 09:16:12 -07:00
benchmark_helper.cc Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
benchmark_helper.h Rename-only: remove Allocator2 etc suffixes now that refactoring is complete 2025-05-06 09:12:43 -07:00
benchmarks.cc Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize. 2024-10-14 04:45:21 -07:00
cross_entropy.cc Huge refactor of weight handling and model loading. 2025-05-06 04:44:21 -07:00
cross_entropy.h Huge refactor of weight handling and model loading. 2025-05-06 04:44:21 -07:00
debug_prompt.cc Move fields, io* and blob* from compression/ into io/ 2025-05-06 11:17:19 -07:00
gemma_batch_bench.cc 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding 2025-05-16 07:42:13 -07:00
gemma_test.cc Minor: mark command line flags as required 2025-05-12 08:30:44 -07:00
prompts.h Benchmark gemma.cpp with different length inputs. 2024-10-10 15:59:26 -07:00
run_mmlu.cc Move fields, io* and blob* from compression/ into io/ 2025-05-06 11:17:19 -07:00