Commit Graph

852 Commits

Author SHA1 Message Date
Balazs Racz 85e2e8ae7f Separate monolithic gemma_lib library into more specific cc_library targets.
Creates new cc_library targets for :attention, :tensor_stats and :activations. Eliminates cyclic dependencies between these libraries.

PiperOrigin-RevId: 845238905
2025-12-16 07:14:09 -08:00
Balazs Racz baa69dfb78 Makes the entire runtime_config passed into the activations constructor.
PiperOrigin-RevId: 845153671
2025-12-16 01:56:52 -08:00
Krzysztof Rymski 44dfd69b9b Internal changes
PiperOrigin-RevId: 844759322
2025-12-15 07:14:37 -08:00
Jan Wassenberg 0c64987a96 Abort if args are unrecognized, refactor argument passing
This catches typos/incorrect usage.
Refactor: group Loader/Threading/Inference into GemmaArgs.
All *Args ctors now have an extra ConsumedArgs& argument.
PiperOrigin-RevId: 844690553
2025-12-15 03:18:45 -08:00
Jan Wassenberg f50550f4ce Warning fixes (sign mismatch), switch default
PiperOrigin-RevId: 844679375
2025-12-15 02:41:19 -08:00
Martin Stolle 506fb22be7 No public description
PiperOrigin-RevId: 843665619
2025-12-12 06:37:17 -08:00
Balazs Racz 338cd8a36e Factors out a new cc_library `:query` from `:gemma-lib`.
Moves query-related structs/classes to gemma/query.h.

This refactors PerQuery, AllQueries, and QBatch into a dedicated header file, gemma/query.h, and updates BUILD dependencies accordingly.

PiperOrigin-RevId: 843604293
2025-12-12 02:53:56 -08:00
Jan Wassenberg 73c3627b67 Add tensor stats and output
tensor_info: add missing header
io: fix mode
weights.h: add layer_idx to LayerWeightsPtrs
PiperOrigin-RevId: 843531051
2025-12-11 22:52:46 -08:00
Martin Stolle bfc0dfcfca Enable flags= parsing
PiperOrigin-RevId: 843103750
2025-12-11 01:17:59 -08:00
Martin Stolle 78deacc357 Make attention configurable on the command line.
PiperOrigin-RevId: 842760721
2025-12-10 09:34:06 -08:00
Martin Stolle 2441ff01bf internal change
PiperOrigin-RevId: 842749037
2025-12-10 09:01:15 -08:00
Krzysztof Rymski 64178ace38 Internal changes
PiperOrigin-RevId: 842727112
2025-12-10 07:55:17 -08:00
Martin Stolle 9689fc82f9 internal change
PiperOrigin-RevId: 842205671
2025-12-09 06:17:08 -08:00
Krzysztof Rymski 64d700cab5 Internal changes
PiperOrigin-RevId: 842194766
2025-12-09 05:42:03 -08:00
Martin Stolle 14a9ecf21d Factor out SumHeads
PiperOrigin-RevId: 842138081
2025-12-09 02:23:16 -08:00
Martin Stolle 1014ae9e2a Adding a simple test for GemmaAttention
PiperOrigin-RevId: 842135414
2025-12-09 02:13:03 -08:00
Jan Wassenberg 5a6895c609 Avoid warning when OS affinity limits us to the second socket
Also simplify NumSMT, detect from .smt field directly

PiperOrigin-RevId: 841749486
2025-12-08 07:10:43 -08:00
Martin Stolle b510ba2ab2 Improve clarity of indices II
Sorry, didn't see this one before.

PiperOrigin-RevId: 840218378
2025-12-04 06:33:33 -08:00
Martin Stolle 9348048885 Clean up toPtrs to delegate to toPtr
PiperOrigin-RevId: 840214969
2025-12-04 06:22:04 -08:00
Krzysztof Rymski 2b4436beb6 Internal changes
PiperOrigin-RevId: 840151004
2025-12-04 02:37:53 -08:00
Martin Stolle d2090fddf3 Improve clarity of indices
PiperOrigin-RevId: 839805634
2025-12-03 10:11:21 -08:00
Nitin Gangahar 6d3e2b6f73 Add missing includes.
PiperOrigin-RevId: 839604341
2025-12-02 23:23:09 -08:00
Jan Wassenberg a084d33e41 Fix Gemma3 image: ensure A matrix is packed, preallocate
Also ignore -2 tokens

PiperOrigin-RevId: 838869988
2025-12-01 11:47:23 -08:00
Jan Wassenberg 1564dd3111 Fix empty enabled_lps in topology detection
Also expand the debug output.

PiperOrigin-RevId: 838832605
2025-12-01 10:23:47 -08:00
Krzysztof Rymski 6e5e4123f1 Internal changes
PiperOrigin-RevId: 837775282
2025-11-28 02:37:06 -08:00
Jan Wassenberg 3c9e6cf113 Expand debug output for topology
PiperOrigin-RevId: 837738553
2025-11-28 00:19:33 -08:00
Jan Wassenberg ccb49bc82f Add ToFloatSlow, move RandomFloat to test_util
PiperOrigin-RevId: 837412290
2025-11-27 00:14:51 -08:00
Krzysztof Rymski c153d5255b Internal changes
PiperOrigin-RevId: 837001762
2025-11-26 01:05:35 -08:00
Martin Stolle 8696f6dd17 Clarify indices
PiperOrigin-RevId: 836235539
2025-11-24 08:27:59 -08:00
Jan Wassenberg 37a25c9ffe Fix warning (signed vs unsigned)
PiperOrigin-RevId: 836106478
2025-11-24 00:51:17 -08:00
Charles Zhao 0e5f4cbf1b Implement Continus Batching.
(1) A function GenerateTWithContinuousBatching is added to use continuous batching when enabled.

(2) The ContinuousQBatch is added as a subclass of QBatch to manage prefill, insert, used-kv-cache-collection.

(3) Also expanded the unit test to more diverse cases.

PiperOrigin-RevId: 836090261
2025-11-23 23:54:02 -08:00
Martin Stolle 88a03b7ec4 Added access to softmax attention internals to regular attention
PiperOrigin-RevId: 835244205
2025-11-21 09:01:01 -08:00
Martin Stolle 5a500872b8 Internal change
PiperOrigin-RevId: 835115693
2025-11-21 01:17:45 -08:00
Martin Stolle 49d420aeaf Add some comments.
PiperOrigin-RevId: 834173319
2025-11-19 01:09:15 -08:00
The gemma.cpp Authors b8f6be72b1 Improves autodetection of Gemma3-1B.
Uses the key_norm and query_norm layers to disambiguate between the Gemma2-2B and Gemma3-1B models.
Since Gemma3-1B is not multimodal, ViT is not an effective disambiguator. KQ normalization is a structural disambiguator between gemma2 and gemma3.

PiperOrigin-RevId: 833213331
2025-11-17 01:12:50 -08:00
The gemma.cpp Authors 7c1656f2fc Fix NibbleCodec for AVX3_{ZEN4,DL,SPR}
PiperOrigin-RevId: 831002073
2025-11-11 11:31:25 -08:00
Jan Wassenberg 3e18db17f4 Avoid hard-coding kPatchSize. Thanks @Somet2mes for reporting. Fixes #762.
PiperOrigin-RevId: 829308896
2025-11-07 00:32:31 -08:00
Charles Zhao f8131339a7 Refactor for continous batching. This cl does not change the current behavior of the code. It only extract two functions that will later be called for adding continuous batching.
PiperOrigin-RevId: 829104661
2025-11-06 14:20:17 -08:00
Martin Stolle 35e9f9f05f Introduce attention implementation configurability.
PiperOrigin-RevId: 828971705
2025-11-06 08:43:41 -08:00
Jan Wassenberg 091b4567c9 Minor: ParallelismStrategy->Parallelism
PiperOrigin-RevId: 828936578
2025-11-06 06:56:10 -08:00
Jan Wassenberg a344a70c59 Change (old) attention behavior to disallow wraparound, enforced via assertion.
Shared kU64PerLine constant

PiperOrigin-RevId: 828072451
2025-11-04 11:52:40 -08:00
Charles Zhao 3a63a12624 Allow prefill only run by allowing max_prompt_size == seq_len
PiperOrigin-RevId: 827415258
2025-11-03 03:17:54 -08:00
Phil Culliton ab87807a4c Pre-compress query activations to BF16 before FlashAttention.
PiperOrigin-RevId: 826524997
2025-10-31 09:49:44 -07:00
Ray Smith 8a100c1e8d Added access to flash attention internals to TileFlashAttention4
PiperOrigin-RevId: 826011137
2025-10-30 06:50:05 -07:00
Jan Wassenberg ee7d79c0a6 Add Decompress2AndCompressInplace helper
PiperOrigin-RevId: 825966142
2025-10-30 04:04:41 -07:00
Jan Wassenberg 006999063c Fix PaliGemma matmul warning
PiperOrigin-RevId: 825627406
2025-10-29 11:15:50 -07:00
Phil Culliton ecab0cef3a Update README with Gemma 3 support and contributor acknowledgments
PiperOrigin-RevId: 825588241
2025-10-29 09:46:51 -07:00
Phil Culliton 036f91f63c Add Gemma 3 270M to gemma_test
PiperOrigin-RevId: 825582368
2025-10-29 09:31:32 -07:00
Phil Culliton 116cd6eff6 BF16 mixed-mode flash attention
PiperOrigin-RevId: 825433929
2025-10-29 01:48:28 -07:00
Jan Wassenberg 4bd465ffd3 Also update attention.h to type-erased query_norm_scale
PiperOrigin-RevId: 825014334
2025-10-28 06:48:33 -07:00