Commit Graph

882 Commits

Author SHA1 Message Date
Krzysztof Rymski 0bce91c4f4 Change to use faster exponent function
PiperOrigin-RevId: 875649021
2026-02-26 07:04:02 -08:00
Jan Wassenberg c6587efe70 Improve instrumentation for ViT parts
PiperOrigin-RevId: 875302990
2026-02-25 13:10:44 -08:00
Krzysztof Rymski df162ead7c Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.
It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512

PiperOrigin-RevId: 874517319
2026-02-24 03:26:49 -08:00
Jan Wassenberg 463a3682be Internal change
PiperOrigin-RevId: 874097322
2026-02-23 08:55:33 -08:00
Krzysztof Rymski 7dc98902d3 Internal changes
PiperOrigin-RevId: 872280443
2026-02-19 01:57:58 -08:00
The gemma.cpp Authors 34739fd9f0 Internal changes
PiperOrigin-RevId: 871792281
2026-02-18 04:07:36 -08:00
Krzysztof Rymski c6696342fa Internal changes
PiperOrigin-RevId: 871776998
2026-02-18 03:21:41 -08:00
Ray Smith 76d7951242 Added wheat_from_chaff_test to test the ability of a model to find a needle in a haystack of data.
Replaced flag with attention_impl to control which attention to run.

PiperOrigin-RevId: 869694868
2026-02-13 06:05:30 -08:00
Krzysztof Rymski 7e5310b908 Internal changes
PiperOrigin-RevId: 867617121
2026-02-09 08:29:15 -08:00
Jan Wassenberg 56fa6e4839 Internal change plus add U8 type, check MatPtrT type at compile time
PiperOrigin-RevId: 867582875
2026-02-09 06:54:11 -08:00
Hana Joo 7c19b31c66 Automated Code Change
PiperOrigin-RevId: 865932055
2026-02-05 07:16:24 -08:00
Jan Wassenberg 2751a194be Fix paligemma: must subtract image tokens from prompt length
PiperOrigin-RevId: 865905454
2026-02-05 05:59:36 -08:00
Krzysztof Rymski 60eed010ba Internal changes
PiperOrigin-RevId: 862680527
2026-01-29 04:48:29 -08:00
Krzysztof Rymski 16a7ba2d6e Internal changes
PiperOrigin-RevId: 854171429
2026-01-09 06:35:36 -08:00
Jan Wassenberg 6d43d6ee19 Build fix for Arm SVE (invalid template arg on op)
PiperOrigin-RevId: 854110884
2026-01-09 02:56:03 -08:00
The gemma.cpp Authors 95592a574e Build fix for Arm SVE (explicit namespace qualification)
PiperOrigin-RevId: 853864585
2026-01-08 13:29:45 -08:00
Jan Wassenberg 42e9cf557d Internal change / remove unused PrintSpeed
PiperOrigin-RevId: 853694463
2026-01-08 05:26:31 -08:00
Balazs Racz 384c390181 Allow overriding hardcoded max_seq_len by cmdline argument seq_len.
Adds a SetMaxSeqLen method to ModelConfig to handle updating both max_seq_len and global attention window sizes. The Gemma constructor now checks if the provided inference seq_len exceeds the model's max_seq_len and, if so, emits a warning and updates the config.

This prevents clipping context to the hard-coded maximum.

PiperOrigin-RevId: 853676074
2026-01-08 04:28:59 -08:00
Jan Wassenberg aeade052c6 Move AssertClose to test_util, add U16
PiperOrigin-RevId: 853321311
2026-01-07 10:33:20 -08:00
Krzysztof Rymski 2ee1fac74c Internal changes
PiperOrigin-RevId: 853138600
2026-01-07 01:21:37 -08:00
Jan Wassenberg 1605925d1e Add int8 quantization stats
Compute the L1 error and Shannon SNR (higher is better).

PiperOrigin-RevId: 846832280
2025-12-19 12:43:03 -08:00
Copybara-Service 11aa16a13d Merge pull request #810 from salmanmkc:upgrade-github-actions-node24
PiperOrigin-RevId: 846692015
2025-12-19 05:27:14 -08:00
Krzysztof Rymski 08a0760271 Internal changes
PiperOrigin-RevId: 846663686
2025-12-19 03:43:15 -08:00
Krzysztof Rymski b73a9ede8f Internal changes
PiperOrigin-RevId: 846648337
2025-12-19 02:46:18 -08:00
Balazs Racz 0ac55f71ed Avoid using Row() for unaligned storage.
PiperOrigin-RevId: 846214605
2025-12-18 05:10:57 -08:00
Krzysztof Rymski 6661d3a60c Internal changes
PiperOrigin-RevId: 846140314
2025-12-18 01:26:43 -08:00
Liam Miller-Cushon 142e6a7e9c No public description
PiperOrigin-RevId: 846030124
2025-12-17 20:10:54 -08:00
Phil Culliton b8a409dbba Use hn::Sub for vector subtraction in flash attention.
PiperOrigin-RevId: 845883321
2025-12-17 12:57:34 -08:00
Balazs Racz 596bdfe5af Separate monolithic gemma_lib library into more specific cc_library targets.
Creates new cc_library targets for :attention, :tensor_stats and :activations. Eliminates cyclic dependencies between these libraries.

PiperOrigin-RevId: 845690136
2025-12-17 03:31:16 -08:00
Salman Chishti a4c78d4454
Merge branch 'dev' into upgrade-github-actions-node24 2025-12-16 14:47:59 +00:00
Salman Muin Kayser Chishti b66aa115ac
Upgrade GitHub Actions for Node 24 compatibility 2025-12-16 14:26:24 +00:00
Balazs Racz baa69dfb78 Makes the entire runtime_config passed into the activations constructor.
PiperOrigin-RevId: 845153671
2025-12-16 01:56:52 -08:00
Krzysztof Rymski 44dfd69b9b Internal changes
PiperOrigin-RevId: 844759322
2025-12-15 07:14:37 -08:00
Jan Wassenberg 0c64987a96 Abort if args are unrecognized, refactor argument passing
This catches typos/incorrect usage.
Refactor: group Loader/Threading/Inference into GemmaArgs.
All *Args ctors now have an extra ConsumedArgs& argument.
PiperOrigin-RevId: 844690553
2025-12-15 03:18:45 -08:00
Jan Wassenberg f50550f4ce Warning fixes (sign mismatch), switch default
PiperOrigin-RevId: 844679375
2025-12-15 02:41:19 -08:00
Martin Stolle 506fb22be7 No public description
PiperOrigin-RevId: 843665619
2025-12-12 06:37:17 -08:00
Balazs Racz 338cd8a36e Factors out a new cc_library `:query` from `:gemma-lib`.
Moves query-related structs/classes to gemma/query.h.

This refactors PerQuery, AllQueries, and QBatch into a dedicated header file, gemma/query.h, and updates BUILD dependencies accordingly.

PiperOrigin-RevId: 843604293
2025-12-12 02:53:56 -08:00
Jan Wassenberg 73c3627b67 Add tensor stats and output
tensor_info: add missing header
io: fix mode
weights.h: add layer_idx to LayerWeightsPtrs
PiperOrigin-RevId: 843531051
2025-12-11 22:52:46 -08:00
Martin Stolle bfc0dfcfca Enable flags= parsing
PiperOrigin-RevId: 843103750
2025-12-11 01:17:59 -08:00
Martin Stolle 78deacc357 Make attention configurable on the command line.
PiperOrigin-RevId: 842760721
2025-12-10 09:34:06 -08:00
Martin Stolle 2441ff01bf internal change
PiperOrigin-RevId: 842749037
2025-12-10 09:01:15 -08:00
Krzysztof Rymski 64178ace38 Internal changes
PiperOrigin-RevId: 842727112
2025-12-10 07:55:17 -08:00
Martin Stolle 9689fc82f9 internal change
PiperOrigin-RevId: 842205671
2025-12-09 06:17:08 -08:00
Krzysztof Rymski 64d700cab5 Internal changes
PiperOrigin-RevId: 842194766
2025-12-09 05:42:03 -08:00
Martin Stolle 14a9ecf21d Factor out SumHeads
PiperOrigin-RevId: 842138081
2025-12-09 02:23:16 -08:00
Martin Stolle 1014ae9e2a Adding a simple test for GemmaAttention
PiperOrigin-RevId: 842135414
2025-12-09 02:13:03 -08:00
Jan Wassenberg 5a6895c609 Avoid warning when OS affinity limits us to the second socket
Also simplify NumSMT, detect from .smt field directly

PiperOrigin-RevId: 841749486
2025-12-08 07:10:43 -08:00
Martin Stolle b510ba2ab2 Improve clarity of indices II
Sorry, didn't see this one before.

PiperOrigin-RevId: 840218378
2025-12-04 06:33:33 -08:00
Martin Stolle 9348048885 Clean up toPtrs to delegate to toPtr
PiperOrigin-RevId: 840214969
2025-12-04 06:22:04 -08:00
Krzysztof Rymski 2b4436beb6 Internal changes
PiperOrigin-RevId: 840151004
2025-12-04 02:37:53 -08:00