gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Jan Wassenberg	cd80d8b24d	Speed up builds by skipping rarely used targets Centralize previous code into GEMMA_DISABLED_TARGETS PiperOrigin-RevId: 772433723	2025-06-17 05:44:20 -07:00
Jan Wassenberg	794a21a4e6	Major refactor to de-templatize gemma-inl and weights This replaces per-weight instantiations of all code with only per-MatMul/norm. Reduces binary size by 133KiB. WeightsOwner is no longer required for type erasing, hence it is replaced with ModelWeightsPtrs. Also remove unused EmbedToken, replaced with EmbedMMToken. PiperOrigin-RevId: 766497657	2025-06-02 23:01:35 -07:00
Jan Wassenberg	cf4d7ceb82	1.16x decode speedup: remove last MatVec in Attention Precompute row pointers. Remove no longer used MHA support; QStride -> qkv_dim. Remove RowPtr from MatMul interface, use only MatPtrT. Require opt-in define for NUQ to speed up builds. Also fix io.cc on Windows. PiperOrigin-RevId: 766228108	2025-06-02 09:40:29 -07:00
Jan Wassenberg	3890eb5412	Remove backprop/ Also remove MatPtrT::Packed(); use PackedScale1 instead where const, or Row(0). PiperOrigin-RevId: 764243198	2025-05-28 07:01:17 -07:00
Jan Wassenberg	e890d46f30	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding Only the weights; binding MatMul output worsens batch=1 prefill. Update gemma_batch_bench to use --decode_qbatch. Fix/remove prefill_activations in gemma-inl.h. Refactor: use BasePageBytes directly when binding Move BindB/C to .cc by de-templatizing Remove MatOwners::AllocateFor because it is weights-specific (binding or not) Disband MatOwners, replace with vector PiperOrigin-RevId: 759610477	2025-05-16 07:42:13 -07:00
Jan Wassenberg	c443adee33	3.8x speedup of weights loading via preadv on Linux Also move BlobReader reading functionality to weights.cc PiperOrigin-RevId: 759240310	2025-05-15 11:55:15 -07:00
Jan Wassenberg	2038dfd9cc	Minor: rename compression/shared -> types.h PiperOrigin-RevId: 758199851	2025-05-13 06:53:21 -07:00
Jan Wassenberg	45ad847a41	Replace RowVectorBatch with MatStorageT KVCache: add ctor required for MatStorageT, remove Create; bf_pre_ffw_rms_out -> pre_ffw_rms_out optimize_test: larger vocab_size requires more steps shared.h: Remove unused u128 type correctly set Activation matrix rows, avoid passing as arg ops: pass Mat instead of pointers/sizes; vectorize LayerNorm; support any weight type mat: add OverrideRows, used by SetBatchSize PiperOrigin-RevId: 757790736	2025-05-12 09:16:12 -07:00
Jan Wassenberg	c8d92948f4	Move fields, io* and blob* from compression/ into io/ PiperOrigin-RevId: 755445712	2025-05-06 11:17:19 -07:00
Jan Wassenberg	275135d7e8	Rename-only: remove Allocator2 etc suffixes now that refactoring is complete PiperOrigin-RevId: 755397220	2025-05-06 09:12:43 -07:00
Jan Wassenberg	8d0882b966	Huge refactor of weight handling and model loading. Weight handling: - new ModelStore2 supports both pre-2025 multi-file and single-file formats - simpler ForEachTensor with TensorArgs - tensors are constructed with their full suffixed name I/O: - support mmap and stride - Simplified SbsWriter, single insert(); add SbsReader Misc: - kMockTokenizer: allow creating with unavailable tokenizer - configs.h: Simpler enum validity checks via kSentinel - matmul.h: remove unused enable_bind (now in allocator.h) - tensor_info: single TensorInfoRegistry class, rename from tensor_index.h Frontends: - Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&) - Deduce model/weight type, remove --model and parsing - Replace most common.h includes with configs.h - Remove --compressed_weights, use --weights instead - Remove ModelInfo, replaced by ModelConfig. Backprop: - Reduce max loss, remove backward_scalar_test (timeout) - Update thresholds because new RandInit changes rng eval order and thus numerics PiperOrigin-RevId: 755317484	2025-05-06 04:44:21 -07:00
Jan Wassenberg	87a658b1c6	Minor cleanup, on-demand NUQ buffer allocation threading_context: add profiler compress-inl: add constexpr, on-demand alloc NUQ buffer gemma_py: model->gemma Move ScaleWeights to compress.cc Move PromptWrapping to configs.h PiperOrigin-RevId: 748347896	2025-04-16 10:49:43 -07:00
Jan Wassenberg	2e722f14f1	Add mmap support (not yet used) Also: const-correct ArgsBase, add assert to mat.h checking element_bytes_ BUILD deps update (:shared provides shared.h, not :sfp) PiperOrigin-RevId: 746073312	2025-04-10 10:03:40 -07:00
Jan Wassenberg	8532da47f7	Major refactor of allocator/args: use new ThreadingContext2 instead of monostate/init in each frontend Add ThreadingArgs(replaces AppArgs) backprop: use Packed() accessor and MakePacked factory and row-based access to allow for stride compress_weights: remove, moving to py-only exporter instead Move MatPtr to mat.h and revise interface: - Generic MatOwner - rename accessors to Packed* - support stride/row accessors, fix RowPtr stride Add TypeBits(Type) Move GenerateMat to test_util-inl for sharing between matmul test/bench Move internal init to gemma.cc to avoid duplication Rename GemmaEnv model_ to gemma_ for disambiguating vs upcoming ModelStorage Remove --compressed_weights, use --weights instead. tensor_index: add ExtentsFromInfo and TensorIndexLLM/Img Allocator: use normal unique_ptr for AllocBytes so users can call directly threading: use -> because AlignedPtr no longer assumes arrays PiperOrigin-RevId: 745918637	2025-04-10 01:29:54 -07:00
Jan Wassenberg	4e6aa36e9b	Minor cleanup: enable 0,0 Extents2D, add SerializedSpan typedef, include fixes PiperOrigin-RevId: 745068776	2025-04-08 03:35:55 -07:00
Jan Wassenberg	1b72c22345	Refactor Gemma ctor and improve pool NUMA support Gemma receives a MatMulEnv arg, with comment on lifetime Split threading into topology so the latter can be used in allocator Add AllocClasses() for non-POD (ThreadPool) Support binding pool to NUMA node Update threading_test with latency measurements Also update Highway version. PiperOrigin-RevId: 736904748	2025-03-14 10:19:00 -07:00
Phil Culliton	4ab601da10	Internal change. PiperOrigin-RevId: 736015810	2025-03-11 23:20:20 -07:00
Phil Culliton	9d83ff202e	Internal change. PiperOrigin-RevId: 736014152	2025-03-11 23:10:48 -07:00
Jan Wassenberg	953c877658	Fix nuq Enc() to handle groups < kGroupSize. Also remove no longer required dynamic allocation. PiperOrigin-RevId: 725203824	2025-02-10 07:17:59 -08:00
Jan Wassenberg	b0fe9a43e6	Further speed up blob_compare: single alloc, use dual sockets PiperOrigin-RevId: 724947361	2025-02-09 10:53:49 -08:00
Jan Wassenberg	b18bd781f6	Windows build fixes: struct vs class, unused arg/var, avoid VLA, Deleter arg, casts PiperOrigin-RevId: 724340518	2025-02-07 07:38:55 -08:00
Jan Wassenberg	f31e12e63b	Improved blob diff: parallel, tolerance for float PiperOrigin-RevId: 724060325	2025-02-06 13:46:28 -08:00
Jan Wassenberg	9f5159ff68	Public visibility for compression/ PiperOrigin-RevId: 723529541	2025-02-05 08:53:51 -08:00
Phil Culliton	7ccc6abe87	Allow conversion, loading and inference with NUQ. PiperOrigin-RevId: 723507890	2025-02-05 07:45:54 -08:00
Phil Culliton	8a6edff319	Base interleaved handling for 4.5-bit NUQ, specifically Enc, DecompressAndZeroPad, and Dec2. Includes tests. PiperOrigin-RevId: 721821577	2025-01-31 10:35:32 -08:00
Daniel Keysers	7af2e70321	Add python wrappers for configs and inference. Enable building compression/python/compression_test using bazel. Add default image path for image_test and paligemma_test. PiperOrigin-RevId: 720583438	2025-01-28 08:22:03 -08:00
Daniel Keysers	bcdb0d65bd	Assorted small cleanups. PiperOrigin-RevId: 720548132	2025-01-28 06:09:45 -08:00
Jan Wassenberg	a60b564b88	Infra improvements (2) ops.h: move CreateInvTimescale to allow calling without depending on gemma Pass around MatMulEnv instead of pools to avoid re-creating the env profiler.h can now be used outside SIMD code allocator: add StepBytes and QuantumSteps rename worker thread with package/cluster in the name threading: add Visit* to IndexRange PiperOrigin-RevId: 718766704	2025-01-23 01:55:19 -08:00
Jan Wassenberg	c4398fc72d	Infra improvements: allocator: support mmap, fixed Bind, add padding bench_matmul: Add PreventElision BUILD: add ops_test build target matmul.h: move ConstMat here; dynamic alloc of MatMulEnv matmul_test: remove benchmarking replace fprintf with HWY_WARN threading.cc: support splitting large clusters (disabled); package_idx->pkg_idx, smaller IndexRangePartition PiperOrigin-RevId: 717512274	2025-01-20 06:22:49 -08:00
Daniel Keysers	493688f6f1	Allow interactive use with new single-file weight format. Add section about new weights format to README.md. Remove model_type_required parameter. Update error handling for flags. PiperOrigin-RevId: 715788822	2025-01-15 07:22:33 -08:00
Ray Smith	9d40f0117e	Added ability to load/save a complete model file, including tokenizer. PiperOrigin-RevId: 707914366	2024-12-19 07:59:41 -08:00
The gemma.cpp Authors	5bc356f18f	Internal change PiperOrigin-RevId: 707268913	2024-12-17 15:15:57 -08:00
Daniel Keysers	62c70d6715	Rename ModelTraining to PromptWrapping which is a more accurate name. PiperOrigin-RevId: 705881500	2024-12-13 07:45:59 -08:00
Ray Smith	6254f2e5ca	Removed duplicated tensor sizes from weights.h by changing the constructor used for MatPtrT PiperOrigin-RevId: 705085054	2024-12-11 06:30:28 -08:00
Ray Smith	e69bc3bc1c	Added the TensorInfo arg to the compressor so the shape and scale can be output correctly to the file in future. Corrected some errors in the TensorIndex. PiperOrigin-RevId: 705014619	2024-12-11 01:26:35 -08:00
Jan Wassenberg	642fc97d51	Internal change PiperOrigin-RevId: 704692923	2024-12-10 06:58:32 -08:00
Jan Wassenberg	f74d496879	Threading/infra improvements. * Add ParallelizeRange helpers and partitioning helpers Refactor Pinning class, store original affinity (required to construct another NestedPools after pinning happened) Compress: * prevent Compress printing stats in tests * zero-pad tensors Matmul: * add matmul_unit_test (TODO) and bench_matmul * matmul_test: change norm to row vectors (that is what is added) and include bf16 rounding error * Prepare for L2/L3 retrieval PiperOrigin-RevId: 700603811	2024-11-27 01:12:00 -08:00
Ray Smith	3d1625d8c5	Improved consistency of compressor API, and added a universal method with a target type arg. Moved configs pybind up to root level. PiperOrigin-RevId: 698743417	2024-11-21 05:27:40 -08:00
Ray Smith	73640d2521	Added tensor_index as a single source of truth on tensor shapes/sources and transformations PiperOrigin-RevId: 697903886	2024-11-19 00:25:39 -08:00
Ray Smith	96513a8dc3	Added a blob_compare tool that compares two sbs files that may have the blobs in a different order PiperOrigin-RevId: 696458888	2024-11-14 03:26:32 -08:00
Paul Chang	5674c33dc5	Replace CLIF SbsWriter with pybind-based gcpp extension Maintains compatibility with previous version. PiperOrigin-RevId: 696181603	2024-11-13 10:20:02 -08:00
Paul Chang	b94295b6d9	Internal changes PiperOrigin-RevId: 696155630	2024-11-13 09:01:38 -08:00
Paul Chang	d4050a2917	Expose BlobReader::Keys() PiperOrigin-RevId: 694166186	2024-11-07 10:28:39 -08:00
Jan Wassenberg	868b01601f	Simpler MatMul interface, vocab types, Tristate for use_spinning Add Extents2D, Range2D vocab types Matmul uses ConstMat for inputs and RowPtr for output Move RowVectorBatch to basics.h Separate threading.cc Fix topology string: report cores not LPs, and #HT Move QStride/IsMHA into LayerConfig ImageTokens does not require make_unique. matmul_test: no longer require template args PiperOrigin-RevId: 692963605	2024-11-04 07:48:29 -08:00
Jan Wassenberg	baaa221787	Move BF16 to basics.h for easier access, and use that typedef. PiperOrigin-RevId: 691422334	2024-10-30 08:09:11 -07:00
Daniel Keysers	583bd93e9a	Factor out addition of ViTConfig to a ModelConfig. Use ModelConfig values for ImageTokens. Output timing info for image token generation. Add a method to copy image data into Image class directly. Minor changes: pipe ModelTraining to more places. PiperOrigin-RevId: 690572283	2024-10-28 05:29:33 -07:00
Jan Wassenberg	19cfe14c76	Warning fixes (casts) and fix Windows build for aligned_alloc PiperOrigin-RevId: 689734618	2024-10-25 04:14:04 -07:00
Jan Wassenberg	52af531820	Serialization for class members for use with ModelConfig PiperOrigin-RevId: 689720027	2024-10-25 03:12:34 -07:00
Paul Chang	4197d69dfc	New blob_store_test, ensure ReadOne checks actual size against requested size PiperOrigin-RevId: 688974390	2024-10-23 08:30:46 -07:00
RangerUFO	7d313aaade	Fix compilation errors of "compress_weights" target	2024-10-19 21:30:30 +08:00

1 2 3

135 Commits