copybara-service[bot]
1a12c4d1a6
Merge 61dedf73ed into 5a6895c609
2025-12-09 09:55:42 +00:00
Krzysztof Rymski
61dedf73ed
Internal changes
...
PiperOrigin-RevId: 841765739
2025-12-09 01:55:38 -08:00
Jan Wassenberg
5a6895c609
Avoid warning when OS affinity limits us to the second socket
...
Also simplify NumSMT, detect from .smt field directly
PiperOrigin-RevId: 841749486
2025-12-08 07:10:43 -08:00
Martin Stolle
b510ba2ab2
Improve clarity of indices II
...
Sorry, didn't see this one before.
PiperOrigin-RevId: 840218378
2025-12-04 06:33:33 -08:00
Martin Stolle
9348048885
Clean up toPtrs to delegate to toPtr
...
PiperOrigin-RevId: 840214969
2025-12-04 06:22:04 -08:00
Krzysztof Rymski
2b4436beb6
Internal changes
...
PiperOrigin-RevId: 840151004
2025-12-04 02:37:53 -08:00
Martin Stolle
d2090fddf3
Improve clarity of indices
...
PiperOrigin-RevId: 839805634
2025-12-03 10:11:21 -08:00
Nitin Gangahar
6d3e2b6f73
Add missing includes.
...
PiperOrigin-RevId: 839604341
2025-12-02 23:23:09 -08:00
Jan Wassenberg
a084d33e41
Fix Gemma3 image: ensure A matrix is packed, preallocate
...
Also ignore -2 tokens
PiperOrigin-RevId: 838869988
2025-12-01 11:47:23 -08:00
Jan Wassenberg
1564dd3111
Fix empty enabled_lps in topology detection
...
Also expand the debug output.
PiperOrigin-RevId: 838832605
2025-12-01 10:23:47 -08:00
Krzysztof Rymski
6e5e4123f1
Internal changes
...
PiperOrigin-RevId: 837775282
2025-11-28 02:37:06 -08:00
Jan Wassenberg
3c9e6cf113
Expand debug output for topology
...
PiperOrigin-RevId: 837738553
2025-11-28 00:19:33 -08:00
Jan Wassenberg
ccb49bc82f
Add ToFloatSlow, move RandomFloat to test_util
...
PiperOrigin-RevId: 837412290
2025-11-27 00:14:51 -08:00
Krzysztof Rymski
c153d5255b
Internal changes
...
PiperOrigin-RevId: 837001762
2025-11-26 01:05:35 -08:00
Martin Stolle
8696f6dd17
Clarify indices
...
PiperOrigin-RevId: 836235539
2025-11-24 08:27:59 -08:00
Jan Wassenberg
37a25c9ffe
Fix warning (signed vs unsigned)
...
PiperOrigin-RevId: 836106478
2025-11-24 00:51:17 -08:00
Charles Zhao
0e5f4cbf1b
Implement Continus Batching.
...
(1) A function GenerateTWithContinuousBatching is added to use continuous batching when enabled.
(2) The ContinuousQBatch is added as a subclass of QBatch to manage prefill, insert, used-kv-cache-collection.
(3) Also expanded the unit test to more diverse cases.
PiperOrigin-RevId: 836090261
2025-11-23 23:54:02 -08:00
Martin Stolle
88a03b7ec4
Added access to softmax attention internals to regular attention
...
PiperOrigin-RevId: 835244205
2025-11-21 09:01:01 -08:00
Martin Stolle
5a500872b8
Internal change
...
PiperOrigin-RevId: 835115693
2025-11-21 01:17:45 -08:00
Martin Stolle
49d420aeaf
Add some comments.
...
PiperOrigin-RevId: 834173319
2025-11-19 01:09:15 -08:00
The gemma.cpp Authors
b8f6be72b1
Improves autodetection of Gemma3-1B.
...
Uses the key_norm and query_norm layers to disambiguate between the Gemma2-2B and Gemma3-1B models.
Since Gemma3-1B is not multimodal, ViT is not an effective disambiguator. KQ normalization is a structural disambiguator between gemma2 and gemma3.
PiperOrigin-RevId: 833213331
2025-11-17 01:12:50 -08:00
The gemma.cpp Authors
7c1656f2fc
Fix NibbleCodec for AVX3_{ZEN4,DL,SPR}
...
PiperOrigin-RevId: 831002073
2025-11-11 11:31:25 -08:00
Jan Wassenberg
3e18db17f4
Avoid hard-coding kPatchSize. Thanks @Somet2mes for reporting. Fixes #762 .
...
PiperOrigin-RevId: 829308896
2025-11-07 00:32:31 -08:00
Charles Zhao
f8131339a7
Refactor for continous batching. This cl does not change the current behavior of the code. It only extract two functions that will later be called for adding continuous batching.
...
PiperOrigin-RevId: 829104661
2025-11-06 14:20:17 -08:00
Martin Stolle
35e9f9f05f
Introduce attention implementation configurability.
...
PiperOrigin-RevId: 828971705
2025-11-06 08:43:41 -08:00
Jan Wassenberg
091b4567c9
Minor: ParallelismStrategy->Parallelism
...
PiperOrigin-RevId: 828936578
2025-11-06 06:56:10 -08:00
Jan Wassenberg
a344a70c59
Change (old) attention behavior to disallow wraparound, enforced via assertion.
...
Shared kU64PerLine constant
PiperOrigin-RevId: 828072451
2025-11-04 11:52:40 -08:00
Charles Zhao
3a63a12624
Allow prefill only run by allowing max_prompt_size == seq_len
...
PiperOrigin-RevId: 827415258
2025-11-03 03:17:54 -08:00
Phil Culliton
ab87807a4c
Pre-compress query activations to BF16 before FlashAttention.
...
PiperOrigin-RevId: 826524997
2025-10-31 09:49:44 -07:00
Ray Smith
8a100c1e8d
Added access to flash attention internals to TileFlashAttention4
...
PiperOrigin-RevId: 826011137
2025-10-30 06:50:05 -07:00
Jan Wassenberg
ee7d79c0a6
Add Decompress2AndCompressInplace helper
...
PiperOrigin-RevId: 825966142
2025-10-30 04:04:41 -07:00
Jan Wassenberg
006999063c
Fix PaliGemma matmul warning
...
PiperOrigin-RevId: 825627406
2025-10-29 11:15:50 -07:00
Phil Culliton
ecab0cef3a
Update README with Gemma 3 support and contributor acknowledgments
...
PiperOrigin-RevId: 825588241
2025-10-29 09:46:51 -07:00
Phil Culliton
036f91f63c
Add Gemma 3 270M to gemma_test
...
PiperOrigin-RevId: 825582368
2025-10-29 09:31:32 -07:00
Phil Culliton
116cd6eff6
BF16 mixed-mode flash attention
...
PiperOrigin-RevId: 825433929
2025-10-29 01:48:28 -07:00
Jan Wassenberg
4bd465ffd3
Also update attention.h to type-erased query_norm_scale
...
PiperOrigin-RevId: 825014334
2025-10-28 06:48:33 -07:00
Jan Wassenberg
3cc0139ebb
Fix excessive KC/MC from prior change
...
This could lead to stack overflow in B_storage.
Also do not require specific type for query_norm_scale,
update batch sizes for attention tensors,
more verbose Mat shape/type checks.
PiperOrigin-RevId: 824987689
2025-10-28 05:33:01 -07:00
Biruk Mammo
5a05857deb
[Gemma.cpp] Allows non-owned arguments for attention methods.
...
* Adds and uses a new `AttentionActivationPtrs` that holds non-owning `MatPtrs`. Acts as a view into `AttentionActivations`.
* Updates `QBatch` to hold non-owning `MatPtr`s to the kv caches.
* Enables the `MatPtrT` default constructor for simpler initializations.
* Pulls out and passes `LayerWeightsPtrs::query_norm_scale` directly. While `LayerWeightsPtrs` already held non-owning `MatPtr`s, this change avoids the need to find and construct several empty weight tensors just to construct one `query_norm_scale` tensor.
PiperOrigin-RevId: 824584177
2025-10-27 10:43:25 -07:00
Jan Wassenberg
86200ce224
1.01x speedup: improved autotune
...
Group M=4..7 into same config. Add configs for power of two sizes.
Allow odd mc to enable a single range for odd M.
io.cc: warning fix(cast).
IsBlock -> !IsOneMC
benchmark_helper: best for verbosity 3, all configs for 4
ops_test: remove unused includes
PiperOrigin-RevId: 824475104
2025-10-27 05:35:31 -07:00
Jan Wassenberg
8198e7104a
Batch bench: 4 runs to give autotuning more time
...
Also print auto-tune info for verbosity 3.
PiperOrigin-RevId: 823555008
2025-10-24 09:14:39 -07:00
Theotime Combes
1bdde1af3c
Add config flag for global timescale & rely on config to deduce wrapping
...
PiperOrigin-RevId: 823512377
2025-10-24 06:54:56 -07:00
Jan Wassenberg
a48e614f64
1.02x speedup: improve load balance and simplify parallelFor
...
Remove ParallelizeOne/TwoRange, use ParallelForAcross/WithinCluster instead.
PiperOrigin-RevId: 823388890
2025-10-24 00:19:09 -07:00
Nitin Gangahar
085a34965a
Update README since backprop and Adam optimizer has been deleted.
...
PiperOrigin-RevId: 823388833
2025-10-24 00:18:05 -07:00
Jan Wassenberg
3ed403e287
Major cleanup of profiler zones, add Caller annotation for all pool.Run
...
Pass ThreadingContext instead of Pools/Profiler individually, for access to Zones
Add GCPP_ZONE helper
Add Caller argument to pool.Run to enable new stats
Remove most direct dependencies on ThreadPool, prefer ParallelFor
PiperOrigin-RevId: 822934530
2025-10-23 01:54:24 -07:00
Nitin Gangahar
9e8ac7e2f0
Use correct offsets in BlobWriter.
...
Updates the FileSize() calls in BlobWriter to instead use a computed offset.
FileSize() may not work with all implementations of File which can cause issues
while writing.
PiperOrigin-RevId: 822646338
2025-10-22 10:29:04 -07:00
Copybara-Service
64a82ed645
Merge pull request #735 from Hitesh-ed:gemma.cpp-windows-build-fix
...
PiperOrigin-RevId: 822559272
2025-10-22 06:26:29 -07:00
Hitesh K V
027288b5e4
Merge branch 'dev' into gemma.cpp-windows-build-fix
2025-10-22 16:53:48 +05:30
Jan Wassenberg
acede9d682
Warning fix (unused var), Windows build fix (missing member variable)
...
PiperOrigin-RevId: 822172982
2025-10-21 10:17:34 -07:00
Hitesh K V
c55120fc6d
Merge branch 'dev' into gemma.cpp-windows-build-fix
2025-10-16 20:18:09 +05:30
Jan Wassenberg
f59eb2ed72
Remove multi-package support from topology
...
Also no longer assume equal-sized clusters
PiperOrigin-RevId: 820164125
2025-10-16 04:00:35 -07:00