Commit Graph

34 Commits

Author SHA1 Message Date
Jan Wassenberg 56186193c1 Replace mt19937 with new generator to enable parallel sampling
Split it into immutable AesCtrEngine and RngStream
Also add RowSpan and Logits span

PiperOrigin-RevId: 803336423
2025-09-04 23:49:10 -07:00
Jan Wassenberg ed2f0bd1b0 Fix pos assertions, refs #665
Ensure the streaming func pos matches the number of calls.
Add two arguments that control pos+1 and pos+=1 behavior.
Also cleanup/add comments.
run: use batch_stream_func, add assert, higher verbosity for MM autotune output
PiperOrigin-RevId: 799511163
2025-08-26 04:50:40 -07:00
Jan Wassenberg ac0d751d20 Rename GetModelConfig->Config
PiperOrigin-RevId: 788506480
2025-07-29 10:18:12 -07:00
Jan Wassenberg a04cc287b2 Move MatMulEnv out of Gemma to enable concurrent calls
Also update benchmark_helper config print: add profiler, remove free mem

PiperOrigin-RevId: 774662974
2025-06-23 01:20:09 -07:00
Jan Wassenberg e5c81f64a1 Major refactor: clarify query_idx (global) vs qi. Refs #607
Fix missing pos increment for last prefill and check that in gemma_test.
Thanks to @ufownl for pointing this out.

Change argument lists to QBatch with accessors.
Increase default seq_len to 8k.

PiperOrigin-RevId: 771937385
2025-06-16 02:42:02 -07:00
Jan Wassenberg ec02726cf7 6x large-batch, short-prompt prefill speedup
Parallelize over queries instead of tokens
introduce non_eos so we only iterate over not yet EOS queries; remove TokenStreamer.
move RMSNormInplaceBatched out of Transformer to call the latter from prefill
Consistent arg order.

Fix gemma_test EOS handling which (caught by msan), remove from tokenizer.h
Also add output to gemma_batch_bench, fix name

PiperOrigin-RevId: 769676106
2025-06-10 09:56:20 -07:00
Daniel Keysers d7b23d532a Restructure internal initialization.
PiperOrigin-RevId: 769507096
2025-06-10 01:25:31 -07:00
Jan Wassenberg d6cfabc2c1 Shorten gemma_test so we can run it for more models.
PiperOrigin-RevId: 759685282
2025-05-16 11:14:41 -07:00
Jan Wassenberg cf7dd80c17 Minor: mark command line flags as required
PiperOrigin-RevId: 757775369
2025-05-12 08:30:44 -07:00
Jan Wassenberg 8d0882b966 Huge refactor of weight handling and model loading.
Weight handling:
- new ModelStore2 supports both pre-2025 multi-file and single-file formats
- simpler ForEachTensor with TensorArgs
- tensors are constructed with their full suffixed name

I/O:
- support mmap and stride
- Simplified SbsWriter, single insert(); add SbsReader

Misc:
- kMockTokenizer: allow creating with unavailable tokenizer
- configs.h: Simpler enum validity checks via kSentinel
- matmul.h: remove unused enable_bind (now in allocator.h)
- tensor_info: single TensorInfoRegistry class, rename from tensor_index.h

Frontends:
- Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&)
- Deduce model/weight type, remove --model and parsing
- Replace most common.h includes with configs.h
- Remove --compressed_weights, use --weights instead
- Remove ModelInfo, replaced by ModelConfig.

Backprop:
- Reduce max loss, remove backward_scalar_test (timeout)
- Update thresholds because new RandInit changes rng eval order and thus numerics
PiperOrigin-RevId: 755317484
2025-05-06 04:44:21 -07:00
Jan Wassenberg 8532da47f7 Major refactor of allocator/args:
use new ThreadingContext2 instead of monostate/init in each frontend
Add ThreadingArgs(replaces AppArgs)

backprop: use Packed() accessor and MakePacked factory and row-based access to allow for stride
compress_weights: remove, moving to py-only exporter instead

Move MatPtr to mat.h and revise interface:
- Generic MatOwner
- rename accessors to Packed*
- support stride/row accessors, fix RowPtr stride

Add TypeBits(Type)
Move GenerateMat to test_util-inl for sharing between matmul test/bench
Move internal init to gemma.cc to avoid duplication
Rename GemmaEnv model_ to gemma_ for disambiguating vs upcoming ModelStorage
Remove --compressed_weights, use --weights instead.
tensor_index: add ExtentsFromInfo and TensorIndexLLM/Img
Allocator: use normal unique_ptr for AllocBytes so users can call directly
threading: use -> because AlignedPtr no longer assumes arrays
PiperOrigin-RevId: 745918637
2025-04-10 01:29:54 -07:00
Copybara-Service bef91a3f03 Merge pull request #529 from ufownl:refactor/wrap_and_tokenize
PiperOrigin-RevId: 745174371
2025-04-08 09:22:26 -07:00
RangerUFO ca4ee2b63f Refactor `WrapAndTokenize` to work properly with Gemma3 2025-03-29 11:31:39 +08:00
Daniel Keysers 493688f6f1 Allow interactive use with new single-file weight format.
Add section about new weights format to README.md.
Remove model_type_required parameter.
Update error handling for flags.

PiperOrigin-RevId: 715788822
2025-01-15 07:22:33 -08:00
Daniel Keysers aed17396be Make prompt wrapping more consistent and fix duplicated tokens for multi-turn.
Do not echo <end_of_turn> tokens to the user.
Have verbosity=0 only show the dialog.

PiperOrigin-RevId: 705021391
2024-12-11 01:52:00 -08:00
Daniel Keysers 719699f132 Make top_k a runtime argument (instead of a model argument).
PiperOrigin-RevId: 696170691
2024-11-13 09:48:59 -08:00
Daniel Keysers e54d9cbddd Fix Griffin model:
- use HalfRope position encodings
- zero-initialize the caches for each Generate at position 0

The lack of the latter made the tests in gemma_test dependent on each other.

PiperOrigin-RevId: 694509054
2024-11-08 08:30:53 -08:00
Daniel Keysers a4d6adbc43 Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize.
Remove max_tokens (and rely on only max_generated_tokens).

PiperOrigin-RevId: 685662260
2024-10-14 04:45:21 -07:00
Daniel Keysers 673673cc98 Update expected entropy values for GRIFFIN_2B model.
These changed after introduction of "Cascaded summation for Softmax"

PiperOrigin-RevId: 678145851
2024-09-24 02:12:59 -07:00
Daniel Keysers 760a69449e Add entropy expectations for Griffin-2b model in gemma_test and make sure it passes.
PiperOrigin-RevId: 675564389
2024-09-17 07:46:06 -07:00
Daniel Keysers 437e0eb9af Internal change. Slight restructuring of gemma_test.
PiperOrigin-RevId: 670529565
2024-09-03 06:16:09 -07:00
Daniel Keysers a8e08778d4 Add an additional QueryModel() overload to GemmaEnv.
Use args only in GemmaEnv constructor, store everything else in RuntimeConfig.
Add runtime option to turn off thread spinning.

PiperOrigin-RevId: 670467320
2024-09-03 02:25:19 -07:00
Daniel Keysers 3c17911875 Make gemma_test slightly more allowing on MultiTurn.
PiperOrigin-RevId: 668097277
2024-08-27 12:40:16 -07:00
Jan Wassenberg c4303cd89b Fix test for 2b - update prompt
PiperOrigin-RevId: 667878053
2024-08-27 00:56:47 -07:00
Daniel Keysers 18e6012872 Fix prefill for batched queries.
This lets gemma_test/GeographyBatched pass now also for gemma2-27B.

PiperOrigin-RevId: 664827485
2024-08-19 08:50:42 -07:00
Jan Wassenberg 22995c699d Simplify pos handling, auto-increment output arg
- no longer multiply by num_queries
- remove unused interleaved prompts
- Rename to Queries*
- Rename batch_start/interleaved_pos/pos to queries_pos

PiperOrigin-RevId: 663331823
2024-08-15 09:25:26 -07:00
Daniel Keysers 7316ee8f96 Fix gemma_test GeographyBatched for 2b-it and add entropy expectations for gemma2-2b-it.
PiperOrigin-RevId: 662072395
2024-08-12 07:12:46 -07:00
Apoorv Reddy fd1b0743a7 Rename Gemma9B and Gemma27B to Gemma2_9B and Gemma2_27B.
This is to make it clear that these models are part of the Gemma2 family of models.

PiperOrigin-RevId: 661181682
2024-08-09 02:09:06 -07:00
Jan Wassenberg aaf51898b6 Major revamp #2 of Prefill: fix token order, parallel for multi-query
- Allocate only the required KV caches and activation batch size
- Add flags for batch sizes
- Const-correct interface: Span of const int.
- Also clean up the KVCache arg to a span.
- Move kPrefillBatchSize into RuntimeConfig and remove related global constants.

PiperOrigin-RevId: 655893197
2024-07-25 03:28:55 -07:00
Daniel Keysers cf76f0a401 Update gemma_test to also pass for the v1.1. models.
Make it an error if the model cannot be loaded.

PiperOrigin-RevId: 650232602
2024-07-08 06:45:37 -07:00
Jan Wassenberg cbb67b4ee0 Move benchmark_helper to evals/, weights_raw to compression/.
PiperOrigin-RevId: 650155983
2024-07-08 01:13:23 -07:00
Daniel Keysers cdebcc3533 Update gemma_test with the expected entropy values for the IT models of size 2B/7B/9B/27B.
PiperOrigin-RevId: 649662047
2024-07-05 08:58:51 -07:00
Jan Wassenberg 118e802b00 Fix gemma_test - moved to evals/.
PiperOrigin-RevId: 649338633
2024-07-04 02:04:05 -07:00
Jan Wassenberg af8eb2fde3 Declutter gemma/ directory, move binaries to evals/ and util/.
PiperOrigin-RevId: 648400795
2024-07-01 09:51:04 -07:00