Commit Graph

838 Commits

Author SHA1 Message Date
Martin Stolle 14a9ecf21d Factor out SumHeads
PiperOrigin-RevId: 842138081
2025-12-09 02:23:16 -08:00
Martin Stolle 1014ae9e2a Adding a simple test for GemmaAttention
PiperOrigin-RevId: 842135414
2025-12-09 02:13:03 -08:00
Jan Wassenberg 5a6895c609 Avoid warning when OS affinity limits us to the second socket
Also simplify NumSMT, detect from .smt field directly

PiperOrigin-RevId: 841749486
2025-12-08 07:10:43 -08:00
Martin Stolle b510ba2ab2 Improve clarity of indices II
Sorry, didn't see this one before.

PiperOrigin-RevId: 840218378
2025-12-04 06:33:33 -08:00
Martin Stolle 9348048885 Clean up toPtrs to delegate to toPtr
PiperOrigin-RevId: 840214969
2025-12-04 06:22:04 -08:00
Krzysztof Rymski 2b4436beb6 Internal changes
PiperOrigin-RevId: 840151004
2025-12-04 02:37:53 -08:00
Martin Stolle d2090fddf3 Improve clarity of indices
PiperOrigin-RevId: 839805634
2025-12-03 10:11:21 -08:00
Nitin Gangahar 6d3e2b6f73 Add missing includes.
PiperOrigin-RevId: 839604341
2025-12-02 23:23:09 -08:00
Jan Wassenberg a084d33e41 Fix Gemma3 image: ensure A matrix is packed, preallocate
Also ignore -2 tokens

PiperOrigin-RevId: 838869988
2025-12-01 11:47:23 -08:00
Jan Wassenberg 1564dd3111 Fix empty enabled_lps in topology detection
Also expand the debug output.

PiperOrigin-RevId: 838832605
2025-12-01 10:23:47 -08:00
Krzysztof Rymski 6e5e4123f1 Internal changes
PiperOrigin-RevId: 837775282
2025-11-28 02:37:06 -08:00
Jan Wassenberg 3c9e6cf113 Expand debug output for topology
PiperOrigin-RevId: 837738553
2025-11-28 00:19:33 -08:00
Jan Wassenberg ccb49bc82f Add ToFloatSlow, move RandomFloat to test_util
PiperOrigin-RevId: 837412290
2025-11-27 00:14:51 -08:00
Krzysztof Rymski c153d5255b Internal changes
PiperOrigin-RevId: 837001762
2025-11-26 01:05:35 -08:00
Martin Stolle 8696f6dd17 Clarify indices
PiperOrigin-RevId: 836235539
2025-11-24 08:27:59 -08:00
Jan Wassenberg 37a25c9ffe Fix warning (signed vs unsigned)
PiperOrigin-RevId: 836106478
2025-11-24 00:51:17 -08:00
Charles Zhao 0e5f4cbf1b Implement Continus Batching.
(1) A function GenerateTWithContinuousBatching is added to use continuous batching when enabled.

(2) The ContinuousQBatch is added as a subclass of QBatch to manage prefill, insert, used-kv-cache-collection.

(3) Also expanded the unit test to more diverse cases.

PiperOrigin-RevId: 836090261
2025-11-23 23:54:02 -08:00
Martin Stolle 88a03b7ec4 Added access to softmax attention internals to regular attention
PiperOrigin-RevId: 835244205
2025-11-21 09:01:01 -08:00
Martin Stolle 5a500872b8 Internal change
PiperOrigin-RevId: 835115693
2025-11-21 01:17:45 -08:00
Martin Stolle 49d420aeaf Add some comments.
PiperOrigin-RevId: 834173319
2025-11-19 01:09:15 -08:00
The gemma.cpp Authors b8f6be72b1 Improves autodetection of Gemma3-1B.
Uses the key_norm and query_norm layers to disambiguate between the Gemma2-2B and Gemma3-1B models.
Since Gemma3-1B is not multimodal, ViT is not an effective disambiguator. KQ normalization is a structural disambiguator between gemma2 and gemma3.

PiperOrigin-RevId: 833213331
2025-11-17 01:12:50 -08:00
The gemma.cpp Authors 7c1656f2fc Fix NibbleCodec for AVX3_{ZEN4,DL,SPR}
PiperOrigin-RevId: 831002073
2025-11-11 11:31:25 -08:00
Jan Wassenberg 3e18db17f4 Avoid hard-coding kPatchSize. Thanks @Somet2mes for reporting. Fixes #762.
PiperOrigin-RevId: 829308896
2025-11-07 00:32:31 -08:00
Charles Zhao f8131339a7 Refactor for continous batching. This cl does not change the current behavior of the code. It only extract two functions that will later be called for adding continuous batching.
PiperOrigin-RevId: 829104661
2025-11-06 14:20:17 -08:00
Martin Stolle 35e9f9f05f Introduce attention implementation configurability.
PiperOrigin-RevId: 828971705
2025-11-06 08:43:41 -08:00
Jan Wassenberg 091b4567c9 Minor: ParallelismStrategy->Parallelism
PiperOrigin-RevId: 828936578
2025-11-06 06:56:10 -08:00
Jan Wassenberg a344a70c59 Change (old) attention behavior to disallow wraparound, enforced via assertion.
Shared kU64PerLine constant

PiperOrigin-RevId: 828072451
2025-11-04 11:52:40 -08:00
Charles Zhao 3a63a12624 Allow prefill only run by allowing max_prompt_size == seq_len
PiperOrigin-RevId: 827415258
2025-11-03 03:17:54 -08:00
Phil Culliton ab87807a4c Pre-compress query activations to BF16 before FlashAttention.
PiperOrigin-RevId: 826524997
2025-10-31 09:49:44 -07:00
Ray Smith 8a100c1e8d Added access to flash attention internals to TileFlashAttention4
PiperOrigin-RevId: 826011137
2025-10-30 06:50:05 -07:00
Jan Wassenberg ee7d79c0a6 Add Decompress2AndCompressInplace helper
PiperOrigin-RevId: 825966142
2025-10-30 04:04:41 -07:00
Jan Wassenberg 006999063c Fix PaliGemma matmul warning
PiperOrigin-RevId: 825627406
2025-10-29 11:15:50 -07:00
Phil Culliton ecab0cef3a Update README with Gemma 3 support and contributor acknowledgments
PiperOrigin-RevId: 825588241
2025-10-29 09:46:51 -07:00
Phil Culliton 036f91f63c Add Gemma 3 270M to gemma_test
PiperOrigin-RevId: 825582368
2025-10-29 09:31:32 -07:00
Phil Culliton 116cd6eff6 BF16 mixed-mode flash attention
PiperOrigin-RevId: 825433929
2025-10-29 01:48:28 -07:00
Jan Wassenberg 4bd465ffd3 Also update attention.h to type-erased query_norm_scale
PiperOrigin-RevId: 825014334
2025-10-28 06:48:33 -07:00
Jan Wassenberg 3cc0139ebb Fix excessive KC/MC from prior change
This could lead to stack overflow in B_storage.

Also do not require specific type for query_norm_scale,
update batch sizes for attention tensors,
more verbose Mat shape/type checks.

PiperOrigin-RevId: 824987689
2025-10-28 05:33:01 -07:00
Biruk Mammo 5a05857deb [Gemma.cpp] Allows non-owned arguments for attention methods.
* Adds and uses a new `AttentionActivationPtrs` that holds non-owning `MatPtrs`. Acts as a view into `AttentionActivations`.
* Updates `QBatch` to hold  non-owning `MatPtr`s to the kv caches.
* Enables the `MatPtrT` default constructor for simpler initializations.
* Pulls out and passes `LayerWeightsPtrs::query_norm_scale` directly. While `LayerWeightsPtrs` already held non-owning `MatPtr`s, this change avoids the need to find and construct several empty weight tensors just to construct one `query_norm_scale` tensor.

PiperOrigin-RevId: 824584177
2025-10-27 10:43:25 -07:00
Jan Wassenberg 86200ce224 1.01x speedup: improved autotune
Group M=4..7 into same config. Add configs for power of two sizes.
Allow odd mc to enable a single range for odd M.

io.cc: warning fix(cast).
IsBlock -> !IsOneMC
benchmark_helper: best for verbosity 3, all configs for 4
ops_test: remove unused includes
PiperOrigin-RevId: 824475104
2025-10-27 05:35:31 -07:00
Jan Wassenberg 8198e7104a Batch bench: 4 runs to give autotuning more time
Also print auto-tune info for verbosity 3.

PiperOrigin-RevId: 823555008
2025-10-24 09:14:39 -07:00
Theotime Combes 1bdde1af3c Add config flag for global timescale & rely on config to deduce wrapping
PiperOrigin-RevId: 823512377
2025-10-24 06:54:56 -07:00
Jan Wassenberg a48e614f64 1.02x speedup: improve load balance and simplify parallelFor
Remove ParallelizeOne/TwoRange, use ParallelForAcross/WithinCluster instead.

PiperOrigin-RevId: 823388890
2025-10-24 00:19:09 -07:00
Nitin Gangahar 085a34965a Update README since backprop and Adam optimizer has been deleted.
PiperOrigin-RevId: 823388833
2025-10-24 00:18:05 -07:00
Jan Wassenberg 3ed403e287 Major cleanup of profiler zones, add Caller annotation for all pool.Run
Pass ThreadingContext instead of Pools/Profiler individually, for access to Zones
Add GCPP_ZONE helper
Add Caller argument to pool.Run to enable new stats
Remove most direct dependencies on ThreadPool, prefer ParallelFor

PiperOrigin-RevId: 822934530
2025-10-23 01:54:24 -07:00
Nitin Gangahar 9e8ac7e2f0 Use correct offsets in BlobWriter.
Updates the FileSize() calls in BlobWriter to instead use a computed offset.
FileSize() may not work with all implementations of File which can cause issues
while writing.

PiperOrigin-RevId: 822646338
2025-10-22 10:29:04 -07:00
Copybara-Service 64a82ed645 Merge pull request #735 from Hitesh-ed:gemma.cpp-windows-build-fix
PiperOrigin-RevId: 822559272
2025-10-22 06:26:29 -07:00
Hitesh K V 027288b5e4
Merge branch 'dev' into gemma.cpp-windows-build-fix 2025-10-22 16:53:48 +05:30
Jan Wassenberg acede9d682 Warning fix (unused var), Windows build fix (missing member variable)
PiperOrigin-RevId: 822172982
2025-10-21 10:17:34 -07:00
Hitesh K V c55120fc6d
Merge branch 'dev' into gemma.cpp-windows-build-fix 2025-10-16 20:18:09 +05:30
Jan Wassenberg f59eb2ed72 Remove multi-package support from topology
Also no longer assume equal-sized clusters

PiperOrigin-RevId: 820164125
2025-10-16 04:00:35 -07:00