Commit Graph

865 Commits

Author SHA1 Message Date
Balazs Racz 384c390181 Allow overriding hardcoded max_seq_len by cmdline argument seq_len.
Adds a SetMaxSeqLen method to ModelConfig to handle updating both max_seq_len and global attention window sizes. The Gemma constructor now checks if the provided inference seq_len exceeds the model's max_seq_len and, if so, emits a warning and updates the config.

This prevents clipping context to the hard-coded maximum.

PiperOrigin-RevId: 853676074
2026-01-08 04:28:59 -08:00
Jan Wassenberg aeade052c6 Move AssertClose to test_util, add U16
PiperOrigin-RevId: 853321311
2026-01-07 10:33:20 -08:00
Krzysztof Rymski 2ee1fac74c Internal changes
PiperOrigin-RevId: 853138600
2026-01-07 01:21:37 -08:00
Jan Wassenberg 1605925d1e Add int8 quantization stats
Compute the L1 error and Shannon SNR (higher is better).

PiperOrigin-RevId: 846832280
2025-12-19 12:43:03 -08:00
Copybara-Service 11aa16a13d Merge pull request #810 from salmanmkc:upgrade-github-actions-node24
PiperOrigin-RevId: 846692015
2025-12-19 05:27:14 -08:00
Krzysztof Rymski 08a0760271 Internal changes
PiperOrigin-RevId: 846663686
2025-12-19 03:43:15 -08:00
Krzysztof Rymski b73a9ede8f Internal changes
PiperOrigin-RevId: 846648337
2025-12-19 02:46:18 -08:00
Balazs Racz 0ac55f71ed Avoid using Row() for unaligned storage.
PiperOrigin-RevId: 846214605
2025-12-18 05:10:57 -08:00
Krzysztof Rymski 6661d3a60c Internal changes
PiperOrigin-RevId: 846140314
2025-12-18 01:26:43 -08:00
Liam Miller-Cushon 142e6a7e9c No public description
PiperOrigin-RevId: 846030124
2025-12-17 20:10:54 -08:00
Phil Culliton b8a409dbba Use hn::Sub for vector subtraction in flash attention.
PiperOrigin-RevId: 845883321
2025-12-17 12:57:34 -08:00
Balazs Racz 596bdfe5af Separate monolithic gemma_lib library into more specific cc_library targets.
Creates new cc_library targets for :attention, :tensor_stats and :activations. Eliminates cyclic dependencies between these libraries.

PiperOrigin-RevId: 845690136
2025-12-17 03:31:16 -08:00
Salman Chishti a4c78d4454
Merge branch 'dev' into upgrade-github-actions-node24 2025-12-16 14:47:59 +00:00
Salman Muin Kayser Chishti b66aa115ac
Upgrade GitHub Actions for Node 24 compatibility 2025-12-16 14:26:24 +00:00
Balazs Racz baa69dfb78 Makes the entire runtime_config passed into the activations constructor.
PiperOrigin-RevId: 845153671
2025-12-16 01:56:52 -08:00
Krzysztof Rymski 44dfd69b9b Internal changes
PiperOrigin-RevId: 844759322
2025-12-15 07:14:37 -08:00
Jan Wassenberg 0c64987a96 Abort if args are unrecognized, refactor argument passing
This catches typos/incorrect usage.
Refactor: group Loader/Threading/Inference into GemmaArgs.
All *Args ctors now have an extra ConsumedArgs& argument.
PiperOrigin-RevId: 844690553
2025-12-15 03:18:45 -08:00
Jan Wassenberg f50550f4ce Warning fixes (sign mismatch), switch default
PiperOrigin-RevId: 844679375
2025-12-15 02:41:19 -08:00
Martin Stolle 506fb22be7 No public description
PiperOrigin-RevId: 843665619
2025-12-12 06:37:17 -08:00
Balazs Racz 338cd8a36e Factors out a new cc_library `:query` from `:gemma-lib`.
Moves query-related structs/classes to gemma/query.h.

This refactors PerQuery, AllQueries, and QBatch into a dedicated header file, gemma/query.h, and updates BUILD dependencies accordingly.

PiperOrigin-RevId: 843604293
2025-12-12 02:53:56 -08:00
Jan Wassenberg 73c3627b67 Add tensor stats and output
tensor_info: add missing header
io: fix mode
weights.h: add layer_idx to LayerWeightsPtrs
PiperOrigin-RevId: 843531051
2025-12-11 22:52:46 -08:00
Martin Stolle bfc0dfcfca Enable flags= parsing
PiperOrigin-RevId: 843103750
2025-12-11 01:17:59 -08:00
Martin Stolle 78deacc357 Make attention configurable on the command line.
PiperOrigin-RevId: 842760721
2025-12-10 09:34:06 -08:00
Martin Stolle 2441ff01bf internal change
PiperOrigin-RevId: 842749037
2025-12-10 09:01:15 -08:00
Krzysztof Rymski 64178ace38 Internal changes
PiperOrigin-RevId: 842727112
2025-12-10 07:55:17 -08:00
Martin Stolle 9689fc82f9 internal change
PiperOrigin-RevId: 842205671
2025-12-09 06:17:08 -08:00
Krzysztof Rymski 64d700cab5 Internal changes
PiperOrigin-RevId: 842194766
2025-12-09 05:42:03 -08:00
Martin Stolle 14a9ecf21d Factor out SumHeads
PiperOrigin-RevId: 842138081
2025-12-09 02:23:16 -08:00
Martin Stolle 1014ae9e2a Adding a simple test for GemmaAttention
PiperOrigin-RevId: 842135414
2025-12-09 02:13:03 -08:00
Jan Wassenberg 5a6895c609 Avoid warning when OS affinity limits us to the second socket
Also simplify NumSMT, detect from .smt field directly

PiperOrigin-RevId: 841749486
2025-12-08 07:10:43 -08:00
Martin Stolle b510ba2ab2 Improve clarity of indices II
Sorry, didn't see this one before.

PiperOrigin-RevId: 840218378
2025-12-04 06:33:33 -08:00
Martin Stolle 9348048885 Clean up toPtrs to delegate to toPtr
PiperOrigin-RevId: 840214969
2025-12-04 06:22:04 -08:00
Krzysztof Rymski 2b4436beb6 Internal changes
PiperOrigin-RevId: 840151004
2025-12-04 02:37:53 -08:00
Martin Stolle d2090fddf3 Improve clarity of indices
PiperOrigin-RevId: 839805634
2025-12-03 10:11:21 -08:00
Nitin Gangahar 6d3e2b6f73 Add missing includes.
PiperOrigin-RevId: 839604341
2025-12-02 23:23:09 -08:00
Jan Wassenberg a084d33e41 Fix Gemma3 image: ensure A matrix is packed, preallocate
Also ignore -2 tokens

PiperOrigin-RevId: 838869988
2025-12-01 11:47:23 -08:00
Jan Wassenberg 1564dd3111 Fix empty enabled_lps in topology detection
Also expand the debug output.

PiperOrigin-RevId: 838832605
2025-12-01 10:23:47 -08:00
Krzysztof Rymski 6e5e4123f1 Internal changes
PiperOrigin-RevId: 837775282
2025-11-28 02:37:06 -08:00
Jan Wassenberg 3c9e6cf113 Expand debug output for topology
PiperOrigin-RevId: 837738553
2025-11-28 00:19:33 -08:00
Jan Wassenberg ccb49bc82f Add ToFloatSlow, move RandomFloat to test_util
PiperOrigin-RevId: 837412290
2025-11-27 00:14:51 -08:00
Krzysztof Rymski c153d5255b Internal changes
PiperOrigin-RevId: 837001762
2025-11-26 01:05:35 -08:00
Martin Stolle 8696f6dd17 Clarify indices
PiperOrigin-RevId: 836235539
2025-11-24 08:27:59 -08:00
Jan Wassenberg 37a25c9ffe Fix warning (signed vs unsigned)
PiperOrigin-RevId: 836106478
2025-11-24 00:51:17 -08:00
Charles Zhao 0e5f4cbf1b Implement Continus Batching.
(1) A function GenerateTWithContinuousBatching is added to use continuous batching when enabled.

(2) The ContinuousQBatch is added as a subclass of QBatch to manage prefill, insert, used-kv-cache-collection.

(3) Also expanded the unit test to more diverse cases.

PiperOrigin-RevId: 836090261
2025-11-23 23:54:02 -08:00
Martin Stolle 88a03b7ec4 Added access to softmax attention internals to regular attention
PiperOrigin-RevId: 835244205
2025-11-21 09:01:01 -08:00
Martin Stolle 5a500872b8 Internal change
PiperOrigin-RevId: 835115693
2025-11-21 01:17:45 -08:00
Martin Stolle 49d420aeaf Add some comments.
PiperOrigin-RevId: 834173319
2025-11-19 01:09:15 -08:00
The gemma.cpp Authors b8f6be72b1 Improves autodetection of Gemma3-1B.
Uses the key_norm and query_norm layers to disambiguate between the Gemma2-2B and Gemma3-1B models.
Since Gemma3-1B is not multimodal, ViT is not an effective disambiguator. KQ normalization is a structural disambiguator between gemma2 and gemma3.

PiperOrigin-RevId: 833213331
2025-11-17 01:12:50 -08:00
The gemma.cpp Authors 7c1656f2fc Fix NibbleCodec for AVX3_{ZEN4,DL,SPR}
PiperOrigin-RevId: 831002073
2025-11-11 11:31:25 -08:00
Jan Wassenberg 3e18db17f4 Avoid hard-coding kPatchSize. Thanks @Somet2mes for reporting. Fixes #762.
PiperOrigin-RevId: 829308896
2025-11-07 00:32:31 -08:00