Commit Graph

21 Commits

Author SHA1 Message Date
Jan Wassenberg c027a45a2e MatPtr-ify KV, shared div_seq_len, --seq_len flag
PiperOrigin-RevId: 770194455
2025-06-11 09:49:38 -07:00
Jan Wassenberg 3890eb5412 Remove backprop/
Also remove MatPtrT::Packed(); use PackedScale1 instead where const, or Row(0).

PiperOrigin-RevId: 764243198
2025-05-28 07:01:17 -07:00
Jan Wassenberg 8d0882b966 Huge refactor of weight handling and model loading.
Weight handling:
- new ModelStore2 supports both pre-2025 multi-file and single-file formats
- simpler ForEachTensor with TensorArgs
- tensors are constructed with their full suffixed name

I/O:
- support mmap and stride
- Simplified SbsWriter, single insert(); add SbsReader

Misc:
- kMockTokenizer: allow creating with unavailable tokenizer
- configs.h: Simpler enum validity checks via kSentinel
- matmul.h: remove unused enable_bind (now in allocator.h)
- tensor_info: single TensorInfoRegistry class, rename from tensor_index.h

Frontends:
- Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&)
- Deduce model/weight type, remove --model and parsing
- Replace most common.h includes with configs.h
- Remove --compressed_weights, use --weights instead
- Remove ModelInfo, replaced by ModelConfig.

Backprop:
- Reduce max loss, remove backward_scalar_test (timeout)
- Update thresholds because new RandInit changes rng eval order and thus numerics
PiperOrigin-RevId: 755317484
2025-05-06 04:44:21 -07:00
Jan Wassenberg f823371691 Cleanup: move util/compress and convert_weights to compression/
Also remove unused models/, lint convert_weights

PiperOrigin-RevId: 649613088
2024-07-05 04:16:52 -07:00
Jan Wassenberg 355f7b4f80 Update developer docs and mention asan/msan
PiperOrigin-RevId: 644000220
2024-06-17 07:29:12 -07:00
Jan Wassenberg f9b390b134 Support all weight types in a single binary.
This changes the command line flags, but the default value retains the previous behavior.

Also add a CreateGemma helper to enable extra args without interface changes.

PiperOrigin-RevId: 641266411
2024-06-07 09:04:45 -07:00
Jan Wassenberg 57c2cd8b52 Simplifications: remove GemmaInterface and GemmaImpl
Split common and weights into separate lib
Remove common-inl (does not have to be SIMD code), activations.cc
Centralize switch(Model) to avoid duplication
Move CompressWeightsT to compress_weights.cc
Move LoadWeights to weights.cc

PiperOrigin-RevId: 640869202
2024-06-06 05:54:21 -07:00
Jan Wassenberg ca971ef50f Document weight conversion
PiperOrigin-RevId: 626957718
2024-04-22 01:58:30 -07:00
Jan Wassenberg a8ceb75f43 Improved IO abstraction layer
Move to unique_ptr-like File class.
Move `if OS_WIN` into wrapper functions.
exists -> Exists.

PiperOrigin-RevId: 625923056
2024-04-17 23:15:07 -07:00
austinvhuang 810b5a0cc2 Update README with more details on contributing code, add experimental/ directory, add READMEs for subdirectories, clean up DEVELOPER notes 2024-03-15 14:10:24 -04:00
Jan Wassenberg bb9b023502 Support Bazel builds. Fixes #16
Also fix nuq/sfp-inl: warning, cast, and disable SCALAR

PiperOrigin-RevId: 612704056
2024-03-04 22:07:25 -08:00
austinvhuang b841620d6b add using gemma as a library notes to DEVELOPERS 2024-02-29 23:52:59 -05:00
austinvhuang 060c8862dd whitespace cleanup 2024-02-27 21:36:43 -05:00
austinvhuang d37f9c3604 re-enable SortIncludes to conform to vanilla Google style, add comment lines to #includes in gemma.h as barriers to block destructive sorting, update doc + remove shell script 2024-02-27 21:23:33 -05:00
austinvhuang 8f3bd63bf7 Fix copybara include path substitutions errors (which break the google3 build) arising from clang-format linter automation 2024-02-27 17:11:15 -05:00
Dan Zheng 874deee302
Update DEVELOPERS.md 2024-02-27 11:32:33 -08:00
austinvhuang 9cdc9223bc clean up formatting after 129e66ada2, add .clang-format defaults, minor updates to DEVELOPERS doc 2024-02-27 14:22:02 -05:00
Dan Zheng afc354dcb1 Import from GitHub.
PiperOrigin-RevId: 610595796
2024-02-26 19:05:11 -08:00
Dan Zheng 8db89304bd No public description
PiperOrigin-RevId: 610498969
2024-02-26 19:03:48 -08:00
austinvhuang 129e66ada2 Reduce KV cache preallocation to 4096 and make it comptime configurable, add rm build note in readme, add note on comptime options in DEVELOPERS, make multiturn=0 the default 2024-02-26 17:05:32 -05:00
Austin Huang e29cd566cf initial commit 2024-02-21 03:31:22 +00:00