gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Charles Zhao	50ee1a3e92	Write SBS progressively. (1) Directly write to file in BlobWriter::Add and destruct the MatOwner to release the rams. (2) Write a fake header to indicate this is V2, and write correct header and directory at the end of the file. (3) Tested on loading sbs written the old way, and new way, both worked. PiperOrigin-RevId: 789306837	2025-07-31 06:05:38 -07:00
Jan Wassenberg	e76e29ce11	De-singleton ThreadingContext so callers can pass in their own weights.cc: fix BindB argument for bf16 tensors threading_test: enable autotune PiperOrigin-RevId: 785763618	2025-07-22 02:08:46 -07:00
Jan Wassenberg	cd80d8b24d	Speed up builds by skipping rarely used targets Centralize previous code into GEMMA_DISABLED_TARGETS PiperOrigin-RevId: 772433723	2025-06-17 05:44:20 -07:00
Jan Wassenberg	e890d46f30	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding Only the weights; binding MatMul output worsens batch=1 prefill. Update gemma_batch_bench to use --decode_qbatch. Fix/remove prefill_activations in gemma-inl.h. Refactor: use BasePageBytes directly when binding Move BindB/C to .cc by de-templatizing Remove MatOwners::AllocateFor because it is weights-specific (binding or not) Disband MatOwners, replace with vector PiperOrigin-RevId: 759610477	2025-05-16 07:42:13 -07:00
Jan Wassenberg	c443adee33	3.8x speedup of weights loading via preadv on Linux Also move BlobReader reading functionality to weights.cc PiperOrigin-RevId: 759240310	2025-05-15 11:55:15 -07:00
Jan Wassenberg	c8d92948f4	Move fields, io* and blob* from compression/ into io/ PiperOrigin-RevId: 755445712	2025-05-06 11:17:19 -07:00
Jan Wassenberg	275135d7e8	Rename-only: remove Allocator2 etc suffixes now that refactoring is complete PiperOrigin-RevId: 755397220	2025-05-06 09:12:43 -07:00
Jan Wassenberg	8d0882b966	Huge refactor of weight handling and model loading. Weight handling: - new ModelStore2 supports both pre-2025 multi-file and single-file formats - simpler ForEachTensor with TensorArgs - tensors are constructed with their full suffixed name I/O: - support mmap and stride - Simplified SbsWriter, single insert(); add SbsReader Misc: - kMockTokenizer: allow creating with unavailable tokenizer - configs.h: Simpler enum validity checks via kSentinel - matmul.h: remove unused enable_bind (now in allocator.h) - tensor_info: single TensorInfoRegistry class, rename from tensor_index.h Frontends: - Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&) - Deduce model/weight type, remove --model and parsing - Replace most common.h includes with configs.h - Remove --compressed_weights, use --weights instead - Remove ModelInfo, replaced by ModelConfig. Backprop: - Reduce max loss, remove backward_scalar_test (timeout) - Update thresholds because new RandInit changes rng eval order and thus numerics PiperOrigin-RevId: 755317484	2025-05-06 04:44:21 -07:00
Jan Wassenberg	8532da47f7	Major refactor of allocator/args: use new ThreadingContext2 instead of monostate/init in each frontend Add ThreadingArgs(replaces AppArgs) backprop: use Packed() accessor and MakePacked factory and row-based access to allow for stride compress_weights: remove, moving to py-only exporter instead Move MatPtr to mat.h and revise interface: - Generic MatOwner - rename accessors to Packed* - support stride/row accessors, fix RowPtr stride Add TypeBits(Type) Move GenerateMat to test_util-inl for sharing between matmul test/bench Move internal init to gemma.cc to avoid duplication Rename GemmaEnv model_ to gemma_ for disambiguating vs upcoming ModelStorage Remove --compressed_weights, use --weights instead. tensor_index: add ExtentsFromInfo and TensorIndexLLM/Img Allocator: use normal unique_ptr for AllocBytes so users can call directly threading: use -> because AlignedPtr no longer assumes arrays PiperOrigin-RevId: 745918637	2025-04-10 01:29:54 -07:00
Phil Culliton	7ccc6abe87	Allow conversion, loading and inference with NUQ. PiperOrigin-RevId: 723507890	2025-02-05 07:45:54 -08:00
Daniel Keysers	7af2e70321	Add python wrappers for configs and inference. Enable building compression/python/compression_test using bazel. Add default image path for image_test and paligemma_test. PiperOrigin-RevId: 720583438	2025-01-28 08:22:03 -08:00
Daniel Keysers	bcdb0d65bd	Assorted small cleanups. PiperOrigin-RevId: 720548132	2025-01-28 06:09:45 -08:00
Ray Smith	9d40f0117e	Added ability to load/save a complete model file, including tokenizer. PiperOrigin-RevId: 707914366	2024-12-19 07:59:41 -08:00
Ray Smith	e69bc3bc1c	Added the TensorInfo arg to the compressor so the shape and scale can be output correctly to the file in future. Corrected some errors in the TensorIndex. PiperOrigin-RevId: 705014619	2024-12-11 01:26:35 -08:00
Ray Smith	3d1625d8c5	Improved consistency of compressor API, and added a universal method with a target type arg. Moved configs pybind up to root level. PiperOrigin-RevId: 698743417	2024-11-21 05:27:40 -08:00
Paul Chang	5674c33dc5	Replace CLIF SbsWriter with pybind-based gcpp extension Maintains compatibility with previous version. PiperOrigin-RevId: 696181603	2024-11-13 10:20:02 -08:00
Ray Smith	85958f5fd3	Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray. Definition of array size is moved to the constructor. Allocation is separate and parallelized. All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted. Replaced all previous ForEachTensor functions with a single unified function. PiperOrigin-RevId: 684451604	2024-10-10 08:22:30 -07:00
Daniel Keysers	1c8ddcdffe	Adds insert_float() to SbsWriter() to store a float array directly. PiperOrigin-RevId: 673982528	2024-09-12 13:27:24 -07:00
Jan Wassenberg	13a9f76f64	Fix mismatch between blob_store and compress interfaces (bytes) PiperOrigin-RevId: 673027268	2024-09-10 10:59:17 -07:00
Jan Wassenberg	5c0da8c8c3	Minor cleanup/fixes: - optimize_test simplify prompt check - Fix SFP arg case - Fix includes - Align inputs in test - IsInside: add DASSERT - Fix PerClusterPool NumThreads PiperOrigin-RevId: 672530385	2024-09-09 06:58:09 -07:00
Jan Wassenberg	c29e9752c7	Refactor/cleanup, remove even_odd * New compression/shared.h, remove sfp.h * Remove unused DistortionStats b_l1_ * Move exact arithmetic functions into fp_arith * Remove even_odd optimization for MatVec (mostly unused) * use BF16 typedef more widely * Add kMaxSFP constant PiperOrigin-RevId: 670996386	2024-09-04 09:25:13 -07:00
The gemma.cpp Authors	c1f243c351	Fix setting scales in Py binding PiperOrigin-RevId: 655284183	2024-07-23 13:32:50 -07:00
Jan Wassenberg	f823371691	Cleanup: move util/compress and convert_weights to compression/ Also remove unused models/, lint convert_weights PiperOrigin-RevId: 649613088	2024-07-05 04:16:52 -07:00
Jan Wassenberg	41efec4dba	Add Py bindings for weight compression TODO: this uses clif instead of pybind11, and depends on absl. PiperOrigin-RevId: 649575815	2024-07-05 01:06:00 -07:00

24 Commits