gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Jan Wassenberg	0c64987a96	Abort if args are unrecognized, refactor argument passing This catches typos/incorrect usage. Refactor: group Loader/Threading/Inference into GemmaArgs. All *Args ctors now have an extra ConsumedArgs& argument. PiperOrigin-RevId: 844690553	2025-12-15 03:18:45 -08:00
Jan Wassenberg	56186193c1	Replace mt19937 with new generator to enable parallel sampling Split it into immutable AesCtrEngine and RngStream Also add RowSpan and Logits span PiperOrigin-RevId: 803336423	2025-09-04 23:49:10 -07:00
Jan Wassenberg	701841897b	Default to disabling per-socket parallelization weights: default to Read for small-batch (only look at qbatch, not the larger prefill tbatch) PiperOrigin-RevId: 790787643	2025-08-04 09:49:14 -07:00
Jan Wassenberg	ac0d751d20	Rename GetModelConfig->Config PiperOrigin-RevId: 788506480	2025-07-29 10:18:12 -07:00
Jan Wassenberg	e76e29ce11	De-singleton ThreadingContext so callers can pass in their own weights.cc: fix BindB argument for bf16 tensors threading_test: enable autotune PiperOrigin-RevId: 785763618	2025-07-22 02:08:46 -07:00
Jan Wassenberg	a04cc287b2	Move MatMulEnv out of Gemma to enable concurrent calls Also update benchmark_helper config print: add profiler, remove free mem PiperOrigin-RevId: 774662974	2025-06-23 01:20:09 -07:00
Jan Wassenberg	f2adbfbcab	Batch inference fixes: set pos during prefill, fix assert PiperOrigin-RevId: 772458760	2025-06-17 07:09:44 -07:00
Jan Wassenberg	e5c81f64a1	Major refactor: clarify query_idx (global) vs qi. Refs #607 Fix missing pos increment for last prefill and check that in gemma_test. Thanks to @ufownl for pointing this out. Change argument lists to QBatch with accessors. Increase default seq_len to 8k. PiperOrigin-RevId: 771937385	2025-06-16 02:42:02 -07:00
Jan Wassenberg	c027a45a2e	MatPtr-ify KV, shared div_seq_len, --seq_len flag PiperOrigin-RevId: 770194455	2025-06-11 09:49:38 -07:00
Jan Wassenberg	9efdcfd45c	1.07x batch decode speedup: more BF16 weights and activations BF16 att_sums and ffw_out Support BF16 B views without decompression Support arbitrary types in MulByConstAndAdd, AddFrom Also update profiler annotations in ops-inl.h PiperOrigin-RevId: 766995010	2025-06-03 23:30:18 -07:00
Jan Wassenberg	45ad847a41	Replace RowVectorBatch with MatStorageT KVCache: add ctor required for MatStorageT, remove Create; bf_pre_ffw_rms_out -> pre_ffw_rms_out optimize_test: larger vocab_size requires more steps shared.h: Remove unused u128 type correctly set Activation matrix rows, avoid passing as arg ops: pass Mat instead of pointers/sizes; vectorize LayerNorm; support any weight type mat: add OverrideRows, used by SetBatchSize PiperOrigin-RevId: 757790736	2025-05-12 09:16:12 -07:00
The gemma.cpp Authors	20757046db	cleanup, new conversation methods, bugfixes - chore: unused parameters cleaned up - bugfix: explicitly use hwy::Span in GenerateInternal() to prevent runtime crashes due to memory layout incompatibility - bugfix: explicit nullptr check in LogDebug - chore: length-related parameters renamed for clarity - feature: SaveConversation() can be optionally used to save copy of a conversation that ResetConversation() will rewind to upon request, rather than just an empty KV cache - feature: GetCurrentConversation() can be used to query the current conversation's name PiperOrigin-RevId: 755873147	2025-05-07 08:52:44 -07:00
Jan Wassenberg	8d0882b966	Huge refactor of weight handling and model loading. Weight handling: - new ModelStore2 supports both pre-2025 multi-file and single-file formats - simpler ForEachTensor with TensorArgs - tensors are constructed with their full suffixed name I/O: - support mmap and stride - Simplified SbsWriter, single insert(); add SbsReader Misc: - kMockTokenizer: allow creating with unavailable tokenizer - configs.h: Simpler enum validity checks via kSentinel - matmul.h: remove unused enable_bind (now in allocator.h) - tensor_info: single TensorInfoRegistry class, rename from tensor_index.h Frontends: - Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&) - Deduce model/weight type, remove --model and parsing - Replace most common.h includes with configs.h - Remove --compressed_weights, use --weights instead - Remove ModelInfo, replaced by ModelConfig. Backprop: - Reduce max loss, remove backward_scalar_test (timeout) - Update thresholds because new RandInit changes rng eval order and thus numerics PiperOrigin-RevId: 755317484	2025-05-06 04:44:21 -07:00
The gemma.cpp Authors	ba10c88a94	Add C API and C# interop files This change adds a basic C API that allows access to Gemma functionality from other programming languages. The functionality is exposed via a shared library (DLL on Windows), with C++ interfaces and a basic C# interop wrapper included. To build the DLL, use the `windows-dll` preset, which includes the C and C++ sources as follows: ``` cmake --preset windows-dll cmake --build --config Release --preset windows-dll -j 4 ``` This should generate a `gemma.dll` in `<build-dir>/Release`. To build for non-Windows, the appropriate C++ DLL linking will need to be done to generate a shared library for the target OS. PiperOrigin-RevId: 750246272	2025-04-22 10:35:47 -07:00

14 Commits