gemma.cpp/compression/python
Jan Wassenberg e890d46f30 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding
Only the weights; binding MatMul output worsens batch=1 prefill.
Update gemma_batch_bench to use --decode_qbatch.
Fix/remove prefill_activations in gemma-inl.h.

Refactor:
use BasePageBytes directly when binding
Move BindB/C to .cc by de-templatizing
Remove MatOwners::AllocateFor because it is weights-specific (binding or not)
Disband MatOwners, replace with vector
PiperOrigin-RevId: 759610477
2025-05-16 07:42:13 -07:00
..
pytree Add Python code for converting Griffin Orbax weights. Refs #301 2024-07-29 12:53:30 -07:00
BUILD.bazel Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
compression_clif_aux.cc 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding 2025-05-16 07:42:13 -07:00
compression_clif_aux.h 3.8x speedup of weights loading via preadv on Linux 2025-05-15 11:55:15 -07:00
compression_extension.cc Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
compression_test.py Huge refactor of weight handling and model loading. 2025-05-06 04:44:21 -07:00
requirements.txt Add python wrappers for configs and inference. 2025-01-28 08:22:03 -08:00