copybara-service[bot]
f08a42f18e
Merge c783b82a82 into 506fb22be7
2025-12-14 14:14:38 +00:00
Martin Stolle
506fb22be7
No public description
...
PiperOrigin-RevId: 843665619
2025-12-12 06:37:17 -08:00
Balazs Racz
338cd8a36e
Factors out a new cc_library `:query` from `:gemma-lib`.
...
Moves query-related structs/classes to gemma/query.h.
This refactors PerQuery, AllQueries, and QBatch into a dedicated header file, gemma/query.h, and updates BUILD dependencies accordingly.
PiperOrigin-RevId: 843604293
2025-12-12 02:53:56 -08:00
Jan Wassenberg
73c3627b67
Add tensor stats and output
...
tensor_info: add missing header
io: fix mode
weights.h: add layer_idx to LayerWeightsPtrs
PiperOrigin-RevId: 843531051
2025-12-11 22:52:46 -08:00
Martin Stolle
bfc0dfcfca
Enable flags= parsing
...
PiperOrigin-RevId: 843103750
2025-12-11 01:17:59 -08:00
Martin Stolle
78deacc357
Make attention configurable on the command line.
...
PiperOrigin-RevId: 842760721
2025-12-10 09:34:06 -08:00
Martin Stolle
2441ff01bf
internal change
...
PiperOrigin-RevId: 842749037
2025-12-10 09:01:15 -08:00
Krzysztof Rymski
64178ace38
Internal changes
...
PiperOrigin-RevId: 842727112
2025-12-10 07:55:17 -08:00
Martin Stolle
9689fc82f9
internal change
...
PiperOrigin-RevId: 842205671
2025-12-09 06:17:08 -08:00
Krzysztof Rymski
64d700cab5
Internal changes
...
PiperOrigin-RevId: 842194766
2025-12-09 05:42:03 -08:00
Martin Stolle
14a9ecf21d
Factor out SumHeads
...
PiperOrigin-RevId: 842138081
2025-12-09 02:23:16 -08:00
Martin Stolle
1014ae9e2a
Adding a simple test for GemmaAttention
...
PiperOrigin-RevId: 842135414
2025-12-09 02:13:03 -08:00
Jan Wassenberg
5a6895c609
Avoid warning when OS affinity limits us to the second socket
...
Also simplify NumSMT, detect from .smt field directly
PiperOrigin-RevId: 841749486
2025-12-08 07:10:43 -08:00
Martin Stolle
b510ba2ab2
Improve clarity of indices II
...
Sorry, didn't see this one before.
PiperOrigin-RevId: 840218378
2025-12-04 06:33:33 -08:00
Martin Stolle
9348048885
Clean up toPtrs to delegate to toPtr
...
PiperOrigin-RevId: 840214969
2025-12-04 06:22:04 -08:00
Krzysztof Rymski
2b4436beb6
Internal changes
...
PiperOrigin-RevId: 840151004
2025-12-04 02:37:53 -08:00
Martin Stolle
d2090fddf3
Improve clarity of indices
...
PiperOrigin-RevId: 839805634
2025-12-03 10:11:21 -08:00
Nitin Gangahar
6d3e2b6f73
Add missing includes.
...
PiperOrigin-RevId: 839604341
2025-12-02 23:23:09 -08:00
Jan Wassenberg
a084d33e41
Fix Gemma3 image: ensure A matrix is packed, preallocate
...
Also ignore -2 tokens
PiperOrigin-RevId: 838869988
2025-12-01 11:47:23 -08:00
Jan Wassenberg
1564dd3111
Fix empty enabled_lps in topology detection
...
Also expand the debug output.
PiperOrigin-RevId: 838832605
2025-12-01 10:23:47 -08:00
Krzysztof Rymski
6e5e4123f1
Internal changes
...
PiperOrigin-RevId: 837775282
2025-11-28 02:37:06 -08:00
Jan Wassenberg
3c9e6cf113
Expand debug output for topology
...
PiperOrigin-RevId: 837738553
2025-11-28 00:19:33 -08:00
Jan Wassenberg
ccb49bc82f
Add ToFloatSlow, move RandomFloat to test_util
...
PiperOrigin-RevId: 837412290
2025-11-27 00:14:51 -08:00
Krzysztof Rymski
c153d5255b
Internal changes
...
PiperOrigin-RevId: 837001762
2025-11-26 01:05:35 -08:00
Martin Stolle
8696f6dd17
Clarify indices
...
PiperOrigin-RevId: 836235539
2025-11-24 08:27:59 -08:00
Jan Wassenberg
37a25c9ffe
Fix warning (signed vs unsigned)
...
PiperOrigin-RevId: 836106478
2025-11-24 00:51:17 -08:00
Charles Zhao
0e5f4cbf1b
Implement Continus Batching.
...
(1) A function GenerateTWithContinuousBatching is added to use continuous batching when enabled.
(2) The ContinuousQBatch is added as a subclass of QBatch to manage prefill, insert, used-kv-cache-collection.
(3) Also expanded the unit test to more diverse cases.
PiperOrigin-RevId: 836090261
2025-11-23 23:54:02 -08:00
Martin Stolle
88a03b7ec4
Added access to softmax attention internals to regular attention
...
PiperOrigin-RevId: 835244205
2025-11-21 09:01:01 -08:00
Martin Stolle
5a500872b8
Internal change
...
PiperOrigin-RevId: 835115693
2025-11-21 01:17:45 -08:00
Martin Stolle
49d420aeaf
Add some comments.
...
PiperOrigin-RevId: 834173319
2025-11-19 01:09:15 -08:00
The gemma.cpp Authors
b8f6be72b1
Improves autodetection of Gemma3-1B.
...
Uses the key_norm and query_norm layers to disambiguate between the Gemma2-2B and Gemma3-1B models.
Since Gemma3-1B is not multimodal, ViT is not an effective disambiguator. KQ normalization is a structural disambiguator between gemma2 and gemma3.
PiperOrigin-RevId: 833213331
2025-11-17 01:12:50 -08:00
The gemma.cpp Authors
7c1656f2fc
Fix NibbleCodec for AVX3_{ZEN4,DL,SPR}
...
PiperOrigin-RevId: 831002073
2025-11-11 11:31:25 -08:00
Jan Wassenberg
3e18db17f4
Avoid hard-coding kPatchSize. Thanks @Somet2mes for reporting. Fixes #762 .
...
PiperOrigin-RevId: 829308896
2025-11-07 00:32:31 -08:00
Charles Zhao
f8131339a7
Refactor for continous batching. This cl does not change the current behavior of the code. It only extract two functions that will later be called for adding continuous batching.
...
PiperOrigin-RevId: 829104661
2025-11-06 14:20:17 -08:00
Martin Stolle
35e9f9f05f
Introduce attention implementation configurability.
...
PiperOrigin-RevId: 828971705
2025-11-06 08:43:41 -08:00
Jan Wassenberg
091b4567c9
Minor: ParallelismStrategy->Parallelism
...
PiperOrigin-RevId: 828936578
2025-11-06 06:56:10 -08:00
Jan Wassenberg
a344a70c59
Change (old) attention behavior to disallow wraparound, enforced via assertion.
...
Shared kU64PerLine constant
PiperOrigin-RevId: 828072451
2025-11-04 11:52:40 -08:00
Charles Zhao
3a63a12624
Allow prefill only run by allowing max_prompt_size == seq_len
...
PiperOrigin-RevId: 827415258
2025-11-03 03:17:54 -08:00
Phil Culliton
ab87807a4c
Pre-compress query activations to BF16 before FlashAttention.
...
PiperOrigin-RevId: 826524997
2025-10-31 09:49:44 -07:00
Ray Smith
8a100c1e8d
Added access to flash attention internals to TileFlashAttention4
...
PiperOrigin-RevId: 826011137
2025-10-30 06:50:05 -07:00
Jan Wassenberg
ee7d79c0a6
Add Decompress2AndCompressInplace helper
...
PiperOrigin-RevId: 825966142
2025-10-30 04:04:41 -07:00
Jan Wassenberg
006999063c
Fix PaliGemma matmul warning
...
PiperOrigin-RevId: 825627406
2025-10-29 11:15:50 -07:00
Phil Culliton
ecab0cef3a
Update README with Gemma 3 support and contributor acknowledgments
...
PiperOrigin-RevId: 825588241
2025-10-29 09:46:51 -07:00
Phil Culliton
036f91f63c
Add Gemma 3 270M to gemma_test
...
PiperOrigin-RevId: 825582368
2025-10-29 09:31:32 -07:00
Phil Culliton
116cd6eff6
BF16 mixed-mode flash attention
...
PiperOrigin-RevId: 825433929
2025-10-29 01:48:28 -07:00
Jan Wassenberg
4bd465ffd3
Also update attention.h to type-erased query_norm_scale
...
PiperOrigin-RevId: 825014334
2025-10-28 06:48:33 -07:00
Jan Wassenberg
3cc0139ebb
Fix excessive KC/MC from prior change
...
This could lead to stack overflow in B_storage.
Also do not require specific type for query_norm_scale,
update batch sizes for attention tensors,
more verbose Mat shape/type checks.
PiperOrigin-RevId: 824987689
2025-10-28 05:33:01 -07:00
Biruk Mammo
5a05857deb
[Gemma.cpp] Allows non-owned arguments for attention methods.
...
* Adds and uses a new `AttentionActivationPtrs` that holds non-owning `MatPtrs`. Acts as a view into `AttentionActivations`.
* Updates `QBatch` to hold non-owning `MatPtr`s to the kv caches.
* Enables the `MatPtrT` default constructor for simpler initializations.
* Pulls out and passes `LayerWeightsPtrs::query_norm_scale` directly. While `LayerWeightsPtrs` already held non-owning `MatPtr`s, this change avoids the need to find and construct several empty weight tensors just to construct one `query_norm_scale` tensor.
PiperOrigin-RevId: 824584177
2025-10-27 10:43:25 -07:00
Jan Wassenberg
86200ce224
1.01x speedup: improved autotune
...
Group M=4..7 into same config. Add configs for power of two sizes.
Allow odd mc to enable a single range for odd M.
io.cc: warning fix(cast).
IsBlock -> !IsOneMC
benchmark_helper: best for verbosity 3, all configs for 4
ops_test: remove unused includes
PiperOrigin-RevId: 824475104
2025-10-27 05:35:31 -07:00
Jan Wassenberg
8198e7104a
Batch bench: 4 runs to give autotuning more time
...
Also print auto-tune info for verbosity 3.
PiperOrigin-RevId: 823555008
2025-10-24 09:14:39 -07:00