gemma.cpp

Commit Graph

Author	SHA1	Message	Date
The gemma.cpp Authors	1e8642f8f4	Internal change. PiperOrigin-RevId: 765037449	2025-05-29 22:51:16 -07:00
Jan Wassenberg	3890eb5412	Remove backprop/ Also remove MatPtrT::Packed(); use PackedScale1 instead where const, or Row(0). PiperOrigin-RevId: 764243198	2025-05-28 07:01:17 -07:00
Jan Wassenberg	627cc04db9	Decouple MatMul from gemma-inl: precompile for all input types Call MatMulStatic instead of MatMul. Also fix build error due to Highway's Lanes not being constexpr. PiperOrigin-RevId: 763777269	2025-05-27 07:08:58 -07:00
Jan Wassenberg	421a2ab8ac	Add comments explaining non-padded tensors, kNoPad -> kPacked PiperOrigin-RevId: 763352173	2025-05-26 03:03:38 -07:00
Copybara-Service	eb8a463038	Merge pull request #574 from ufownl:bugfix/vit_weights PiperOrigin-RevId: 761948356	2025-05-22 07:04:53 -07:00
RangerUFO	2771f463f9	Fix the ViT weights loading	2025-05-22 12:13:29 +08:00
Copybara-Service	1ce89788ef	Merge pull request #573 from ufownl:bugfix/vit PiperOrigin-RevId: 761425663	2025-05-21 01:58:00 -07:00
RangerUFO	6debdbe341	Minor fixes for ViT	2025-05-20 22:27:10 +08:00
Jan Wassenberg	cb188d4a0e	Fix RowT issue and improve Griffin (currently still broken) Use type-safe MatPtrT via dynamic_cast, avoid/remove unsafe RowT activations: Griffin tensors are now padded Griffin: add batching support, fix conv1d_cache allocation weights: bundle to TensorToRead, add kNoPad flag, fix SplitW1 const-correct fix for ForEachTensor blob_store: move BlobIO2 to .cc and rename BlobIO PiperOrigin-RevId: 760610094	2025-05-19 07:02:10 -07:00
Jan Wassenberg	d6cfabc2c1	Shorten gemma_test so we can run it for more models. PiperOrigin-RevId: 759685282	2025-05-16 11:14:41 -07:00
Jan Wassenberg	e890d46f30	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding Only the weights; binding MatMul output worsens batch=1 prefill. Update gemma_batch_bench to use --decode_qbatch. Fix/remove prefill_activations in gemma-inl.h. Refactor: use BasePageBytes directly when binding Move BindB/C to .cc by de-templatizing Remove MatOwners::AllocateFor because it is weights-specific (binding or not) Disband MatOwners, replace with vector PiperOrigin-RevId: 759610477	2025-05-16 07:42:13 -07:00
Jan Wassenberg	c443adee33	3.8x speedup of weights loading via preadv on Linux Also move BlobReader reading functionality to weights.cc PiperOrigin-RevId: 759240310	2025-05-15 11:55:15 -07:00
Jan Wassenberg	38a08d8095	Replace last ConstMat with MatPtr This is to reduce the number of MatMul overloads in preparation for de-templatizing. PiperOrigin-RevId: 758288589	2025-05-13 10:55:22 -07:00
Copybara-Service	0a6a7e4cd6	Merge pull request #566 from ufownl:bugfix/deduced_model_wrapping PiperOrigin-RevId: 758276145	2025-05-13 10:28:16 -07:00
RangerUFO	30ad625f42	Fix the wrapping field of the deduced model config	2025-05-13 23:02:03 +08:00
Jan Wassenberg	8a312e9b89	Split W1/W2 as a load-time preprocess. Remove kOnlyAllocate - no longer used. Rename ReadOrAllocate -> ReadFromBlobs. Rename Reshape -> Fixup to reflect the new scope. Remove no longer used ShrinkRows. This simplifies gemma-inl and is a prerequisite for removing ConstMat (whose .ofs was previously used for merged tensors) PiperOrigin-RevId: 758214083	2025-05-13 07:39:59 -07:00
Jan Wassenberg	2038dfd9cc	Minor: rename compression/shared -> types.h PiperOrigin-RevId: 758199851	2025-05-13 06:53:21 -07:00
Jan Wassenberg	d538a6d6c6	Cleanup: remove unused kCyclic, remove 2 suffix Also remove now unused allocator arg and fix warnings (cast, struct/class mismatch) PiperOrigin-RevId: 758098495	2025-05-13 01:06:41 -07:00
Biruk Mammo	ba21e3beb4	Adds a `GemmaAttention` constructor that takes an explicit `ThreadingContext`. PiperOrigin-RevId: 757839682	2025-05-12 11:17:05 -07:00
Jan Wassenberg	45ad847a41	Replace RowVectorBatch with MatStorageT KVCache: add ctor required for MatStorageT, remove Create; bf_pre_ffw_rms_out -> pre_ffw_rms_out optimize_test: larger vocab_size requires more steps shared.h: Remove unused u128 type correctly set Activation matrix rows, avoid passing as arg ops: pass Mat instead of pointers/sizes; vectorize LayerNorm; support any weight type mat: add OverrideRows, used by SetBatchSize PiperOrigin-RevId: 757790736	2025-05-12 09:16:12 -07:00
Jan Wassenberg	cf7dd80c17	Minor: mark command line flags as required PiperOrigin-RevId: 757775369	2025-05-12 08:30:44 -07:00
Jan Wassenberg	252a4e955e	Remove support for Gemma 1 and PaliGemma 1 models, superseded by (Pali)Gemma 2. PiperOrigin-RevId: 756671308	2025-05-09 02:17:27 -07:00
Biruk Mammo	d834c07042	Exposes `GemmaAttention::DotSoftmaxWeightedSum` for experimentation. Also in this change: * The computation for a single `q` is factored out and exposed. * Strided `ConstMat` views into the KV caches are introduced to enable experimentation with various KV cache layouts. PiperOrigin-RevId: 756339313	2025-05-08 09:19:04 -07:00
Jan Wassenberg	a0ff98ea60	Entirely remove constexpr on PaddedDirEnd. Refs #551 Apparently GCC 9.4 does not handle HWY_CXX17_CONSTEXPR as we intend. PiperOrigin-RevId: 755967709	2025-05-07 12:48:19 -07:00
Biruk Mammo	d9d1709df8	Updates stale references to `compression/migrate_weights`. PiperOrigin-RevId: 755938143	2025-05-07 11:33:59 -07:00
The gemma.cpp Authors	20757046db	cleanup, new conversation methods, bugfixes - chore: unused parameters cleaned up - bugfix: explicitly use hwy::Span in GenerateInternal() to prevent runtime crashes due to memory layout incompatibility - bugfix: explicit nullptr check in LogDebug - chore: length-related parameters renamed for clarity - feature: SaveConversation() can be optionally used to save copy of a conversation that ResetConversation() will rewind to upon request, rather than just an empty KV cache - feature: GetCurrentConversation() can be used to query the current conversation's name PiperOrigin-RevId: 755873147	2025-05-07 08:52:44 -07:00
Jan Wassenberg	e9ecb7794d	Fix gcc build error and gemma3 crash, thanks @ufownl, fixes #551 PiperOrigin-RevId: 755729478	2025-05-07 00:59:18 -07:00
Jan Wassenberg	c8d92948f4	Move fields, io* and blob* from compression/ into io/ PiperOrigin-RevId: 755445712	2025-05-06 11:17:19 -07:00
Jan Wassenberg	275135d7e8	Rename-only: remove Allocator2 etc suffixes now that refactoring is complete PiperOrigin-RevId: 755397220	2025-05-06 09:12:43 -07:00
Jan Wassenberg	8d0882b966	Huge refactor of weight handling and model loading. Weight handling: - new ModelStore2 supports both pre-2025 multi-file and single-file formats - simpler ForEachTensor with TensorArgs - tensors are constructed with their full suffixed name I/O: - support mmap and stride - Simplified SbsWriter, single insert(); add SbsReader Misc: - kMockTokenizer: allow creating with unavailable tokenizer - configs.h: Simpler enum validity checks via kSentinel - matmul.h: remove unused enable_bind (now in allocator.h) - tensor_info: single TensorInfoRegistry class, rename from tensor_index.h Frontends: - Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&) - Deduce model/weight type, remove --model and parsing - Replace most common.h includes with configs.h - Remove --compressed_weights, use --weights instead - Remove ModelInfo, replaced by ModelConfig. Backprop: - Reduce max loss, remove backward_scalar_test (timeout) - Update thresholds because new RandInit changes rng eval order and thus numerics PiperOrigin-RevId: 755317484	2025-05-06 04:44:21 -07:00
Jan Wassenberg	a3caf6e5d2	Add summary of optimizations/infra present in the repository PiperOrigin-RevId: 754838402	2025-05-05 01:46:01 -07:00
Jan Wassenberg	fe80f10ed7	Backprop test fixes and allocator cleanup - Shorten backprop tests to prevent timeout - Add line number of failing test - matmul: remove unused enable_bind - allocator: we will retain enable_bind there - mat: disable cyclic padding optimization (broken) PiperOrigin-RevId: 752656068	2025-04-29 03:01:10 -07:00
Jan Wassenberg	160a5824fb	Cleanup: include fixes/comments, fix leak, vector reserve Also remove unused RowSpan configs.cc: Assign prompt wrapping to ModelConfig configs.h: simplify EnumValid via sentinel PiperOrigin-RevId: 750278497	2025-04-22 12:01:46 -07:00
The gemma.cpp Authors	ba10c88a94	Add C API and C# interop files This change adds a basic C API that allows access to Gemma functionality from other programming languages. The functionality is exposed via a shared library (DLL on Windows), with C++ interfaces and a basic C# interop wrapper included. To build the DLL, use the `windows-dll` preset, which includes the C and C++ sources as follows: ``` cmake --preset windows-dll cmake --build --config Release --preset windows-dll -j 4 ``` This should generate a `gemma.dll` in `<build-dir>/Release`. To build for non-Windows, the appropriate C++ DLL linking will need to be done to generate a shared library for the target OS. PiperOrigin-RevId: 750246272	2025-04-22 10:35:47 -07:00
Copybara-Service	f20da328de	Merge pull request #539 from prajwalc22:feature-prompt-flag PiperOrigin-RevId: 750118715	2025-04-22 03:09:19 -07:00
prajwalc22	2407150f84	Merge branch 'feature-prompt-flag' of github.com:prajwalc22/gemma.cpp into feature-prompt-flag	2025-04-17 23:54:46 +05:30
prajwalc22	a9e56c27eb	removed unnecessary threading.h import	2025-04-17 23:44:23 +05:30
Prajwal Choudhari	09dfb144c0	Merge branch 'dev' into feature-prompt-flag	2025-04-17 18:53:28 +05:30
prajwalc22	f55c321397	Address review feedback: Fix prefill_tbatch_size and variable placement issues	2025-04-17 10:15:21 +05:30
prajwalc22	27c28cc938	Address review feedback: Fix prefill_tbatch_size and variable placement issues	2025-04-17 10:15:05 +05:30
Jan Wassenberg	87a658b1c6	Minor cleanup, on-demand NUQ buffer allocation threading_context: add profiler compress-inl: add constexpr, on-demand alloc NUQ buffer gemma_py: model->gemma Move ScaleWeights to compress.cc Move PromptWrapping to configs.h PiperOrigin-RevId: 748347896	2025-04-16 10:49:43 -07:00
prajwalc22	8246e49199	Add non-interactive mode support - Added prompt flag to InferenceArgs for non-interactive mode - Set user-facing options to verbosity level 1 - Fixed prompt_size declaration and variable ordering in run.cc - Properly set prompt_size after WrapAndTokenize calls - Moved kVerboseLogTokens block after prompt_size is set	2025-04-16 16:26:52 +05:30
prajwalc22	cbf179990f	Add --prompt flag for non-interactive mode	2025-04-16 15:34:43 +05:30
prajwalc22	716713f0e6	Update .gitignore to exclude build directory and model files	2025-04-16 09:52:30 +05:30
prajwalc22	01caf379ba	Update .gitignore to exclude build directory and model files	2025-04-16 09:45:14 +05:30
prajwalc22	87a1c76578	Update CMake configuration and documentation for --prompt flag	2025-04-16 09:45:14 +05:30
prajwalc22	f3116d2577	Add --prompt flag for non-interactive mode This change adds a --prompt command-line option that allows users to provide prompts directly without entering interactive mode, which is useful for scripting and automation.	2025-04-16 09:45:02 +05:30
The gemma.cpp Authors	7164a5e844	Internal change. PiperOrigin-RevId: 746953110	2025-04-12 20:27:49 -07:00
Jan Wassenberg	2e722f14f1	Add mmap support (not yet used) Also: const-correct ArgsBase, add assert to mat.h checking element_bytes_ BUILD deps update (:shared provides shared.h, not :sfp) PiperOrigin-RevId: 746073312	2025-04-10 10:03:40 -07:00
Jan Wassenberg	8532da47f7	Major refactor of allocator/args: use new ThreadingContext2 instead of monostate/init in each frontend Add ThreadingArgs(replaces AppArgs) backprop: use Packed() accessor and MakePacked factory and row-based access to allow for stride compress_weights: remove, moving to py-only exporter instead Move MatPtr to mat.h and revise interface: - Generic MatOwner - rename accessors to Packed* - support stride/row accessors, fix RowPtr stride Add TypeBits(Type) Move GenerateMat to test_util-inl for sharing between matmul test/bench Move internal init to gemma.cc to avoid duplication Rename GemmaEnv model_ to gemma_ for disambiguating vs upcoming ModelStorage Remove --compressed_weights, use --weights instead. tensor_index: add ExtentsFromInfo and TensorIndexLLM/Img Allocator: use normal unique_ptr for AllocBytes so users can call directly threading: use -> because AlignedPtr no longer assumes arrays PiperOrigin-RevId: 745918637	2025-04-10 01:29:54 -07:00

... 2 3 4 5 6 ...

794 Commits All Branches Search

794 Commits

All Branches