gemma.cpp

History

Jan Wassenberg e890d46f30 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding Only the weights; binding MatMul output worsens batch=1 prefill. Update gemma_batch_bench to use --decode_qbatch. Fix/remove prefill_activations in gemma-inl.h. Refactor: use BasePageBytes directly when binding Move BindB/C to .cc by de-templatizing Remove MatOwners::AllocateFor because it is weights-specific (binding or not) Disband MatOwners, replace with vector PiperOrigin-RevId: 759610477		2025-05-16 07:42:13 -07:00
..
python	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding	2025-05-16 07:42:13 -07:00
BUILD.bazel	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
analyze.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
compress-inl.h	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
compress.cc	Minor cleanup, on-demand NUQ buffer allocation	2025-04-16 10:49:43 -07:00
compress.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
compress_test.cc	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
distortion.h	Refactor/cleanup, remove even_odd	2024-09-04 09:25:13 -07:00
distortion_test.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
nuq-inl.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
nuq_test.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
sfp-inl.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
sfp_test.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
test_util-inl.h	Major refactor of allocator/args:	2025-04-10 01:29:54 -07:00
types.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00