gemma.cpp

621 Commits 20 Branches 5 Tags 35 MiB

Author	SHA1	Message	Date
Jan Wassenberg	8d0882b966	Huge refactor of weight handling and model loading. Weight handling: - new ModelStore2 supports both pre-2025 multi-file and single-file formats - simpler ForEachTensor with TensorArgs - tensors are constructed with their full suffixed name I/O: - support mmap and stride - Simplified SbsWriter, single insert(); add SbsReader Misc: - kMockTokenizer: allow creating with unavailable tokenizer - configs.h: Simpler enum validity checks via kSentinel - matmul.h: remove unused enable_bind (now in allocator.h) - tensor_info: single TensorInfoRegistry class, rename from tensor_index.h Frontends: - Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&) - Deduce model/weight type, remove --model and parsing - Replace most common.h includes with configs.h - Remove --compressed_weights, use --weights instead - Remove ModelInfo, replaced by ModelConfig. Backprop: - Reduce max loss, remove backward_scalar_test (timeout) - Update thresholds because new RandInit changes rng eval order and thus numerics PiperOrigin-RevId: 755317484	2025-05-06 04:44:21 -07:00
Jan Wassenberg	160a5824fb	Cleanup: include fixes/comments, fix leak, vector reserve Also remove unused RowSpan configs.cc: Assign prompt wrapping to ModelConfig configs.h: simplify EnumValid via sentinel PiperOrigin-RevId: 750278497	2025-04-22 12:01:46 -07:00
Daniel Keysers	5bbe814a53	Tiny cleanup. PiperOrigin-RevId: 704636988	2024-12-10 03:34:05 -08:00
Daniel Keysers	719699f132	Make top_k a runtime argument (instead of a model argument). PiperOrigin-RevId: 696170691	2024-11-13 09:48:59 -08:00
Ray Smith	0d68555f87	Eliminated TConfig. Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively. Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly. Adjusted WeightsWrapper and ForwardLayer etc to match. The only remaining template arg is the weight type. This enables all the instantiations to be deleted, apart from one per type. It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately. Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively. PiperOrigin-RevId: 686870060	2024-10-17 05:04:22 -07:00
Daniel Keysers	a4d6adbc43	Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize. Remove max_tokens (and rely on only max_generated_tokens). PiperOrigin-RevId: 685662260	2024-10-14 04:45:21 -07:00
Jan Wassenberg	2d14d796e3	1.09x decode speedup for topk=1/temp0: fuse softmax and sample PiperOrigin-RevId: 680589099	2024-09-30 08:37:41 -07:00
Jan Wassenberg	85fcd3cd80	Cleanup: add ModelInfo struct, remove gcpp:: PiperOrigin-RevId: 648707763	2024-07-02 07:11:15 -07:00
Jan Wassenberg	af8eb2fde3	Declutter gemma/ directory, move binaries to evals/ and util/. PiperOrigin-RevId: 648400795	2024-07-01 09:51:04 -07:00

Renamed from gemma/cross_entropy.cc (Browse further)

9 Commits