Jan Wassenberg
fbd44cee42
Fix Windows warnings
...
PiperOrigin-RevId: 877338937
2026-03-02 04:53:25 -08:00
The gemma.cpp Authors
a3d994915f
No public description
...
PiperOrigin-RevId: 877333188
2026-03-02 04:32:29 -08:00
Ray Smith
16c1b29b89
Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
...
PiperOrigin-RevId: 877308306
2026-03-02 03:11:01 -08:00
Miguel Lobo
f7f5fd5863
Add ability to load custom models which are fully described by the ModelConfig blob.
...
PiperOrigin-RevId: 877265257
2026-03-02 01:18:33 -08:00
Nikhil Dev Goyal
dd268ddbe8
Add FastGelu activation function in a newly created created fast_ops-inl.h files.
...
This replaces the Tanh call with FastTanh call in the Gelu function written in math-inl.h.
PiperOrigin-RevId: 876339830
2026-02-27 11:14:47 -08:00
Krzysztof Rymski
bdba3bfa63
remove const to fix windows builds
...
PiperOrigin-RevId: 876232691
2026-02-27 06:56:54 -08:00
Viktor Shipitsin
d8a123e4ec
Use a struct to manage the mapping between `AttentionImpl` enum values and their string names, simplifying `GetAttentionImplName` function. Add a test to ensure all valid `AttentionImpl` enums have a corresponding name and can be looked up.
...
PiperOrigin-RevId: 876124604
2026-02-27 01:31:11 -08:00
Jan Wassenberg
c6587efe70
Improve instrumentation for ViT parts
...
PiperOrigin-RevId: 875302990
2026-02-25 13:10:44 -08:00
Krzysztof Rymski
df162ead7c
Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.
...
It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512
PiperOrigin-RevId: 874517319
2026-02-24 03:26:49 -08:00
Jan Wassenberg
463a3682be
Internal change
...
PiperOrigin-RevId: 874097322
2026-02-23 08:55:33 -08:00
Krzysztof Rymski
7dc98902d3
Internal changes
...
PiperOrigin-RevId: 872280443
2026-02-19 01:57:58 -08:00
The gemma.cpp Authors
34739fd9f0
Internal changes
...
PiperOrigin-RevId: 871792281
2026-02-18 04:07:36 -08:00
Krzysztof Rymski
c6696342fa
Internal changes
...
PiperOrigin-RevId: 871776998
2026-02-18 03:21:41 -08:00
Ray Smith
76d7951242
Added wheat_from_chaff_test to test the ability of a model to find a needle in a haystack of data.
...
Replaced flag with attention_impl to control which attention to run.
PiperOrigin-RevId: 869694868
2026-02-13 06:05:30 -08:00
Krzysztof Rymski
7e5310b908
Internal changes
...
PiperOrigin-RevId: 867617121
2026-02-09 08:29:15 -08:00
Jan Wassenberg
56fa6e4839
Internal change plus add U8 type, check MatPtrT type at compile time
...
PiperOrigin-RevId: 867582875
2026-02-09 06:54:11 -08:00
Hana Joo
7c19b31c66
Automated Code Change
...
PiperOrigin-RevId: 865932055
2026-02-05 07:16:24 -08:00
Jan Wassenberg
2751a194be
Fix paligemma: must subtract image tokens from prompt length
...
PiperOrigin-RevId: 865905454
2026-02-05 05:59:36 -08:00
Krzysztof Rymski
60eed010ba
Internal changes
...
PiperOrigin-RevId: 862680527
2026-01-29 04:48:29 -08:00
Krzysztof Rymski
16a7ba2d6e
Internal changes
...
PiperOrigin-RevId: 854171429
2026-01-09 06:35:36 -08:00
Jan Wassenberg
6d43d6ee19
Build fix for Arm SVE (invalid template arg on op)
...
PiperOrigin-RevId: 854110884
2026-01-09 02:56:03 -08:00
The gemma.cpp Authors
95592a574e
Build fix for Arm SVE (explicit namespace qualification)
...
PiperOrigin-RevId: 853864585
2026-01-08 13:29:45 -08:00
Jan Wassenberg
42e9cf557d
Internal change / remove unused PrintSpeed
...
PiperOrigin-RevId: 853694463
2026-01-08 05:26:31 -08:00
Balazs Racz
384c390181
Allow overriding hardcoded max_seq_len by cmdline argument seq_len.
...
Adds a SetMaxSeqLen method to ModelConfig to handle updating both max_seq_len and global attention window sizes. The Gemma constructor now checks if the provided inference seq_len exceeds the model's max_seq_len and, if so, emits a warning and updates the config.
This prevents clipping context to the hard-coded maximum.
PiperOrigin-RevId: 853676074
2026-01-08 04:28:59 -08:00
Jan Wassenberg
aeade052c6
Move AssertClose to test_util, add U16
...
PiperOrigin-RevId: 853321311
2026-01-07 10:33:20 -08:00
Krzysztof Rymski
2ee1fac74c
Internal changes
...
PiperOrigin-RevId: 853138600
2026-01-07 01:21:37 -08:00
Jan Wassenberg
1605925d1e
Add int8 quantization stats
...
Compute the L1 error and Shannon SNR (higher is better).
PiperOrigin-RevId: 846832280
2025-12-19 12:43:03 -08:00
Copybara-Service
11aa16a13d
Merge pull request #810 from salmanmkc:upgrade-github-actions-node24
...
PiperOrigin-RevId: 846692015
2025-12-19 05:27:14 -08:00
Krzysztof Rymski
08a0760271
Internal changes
...
PiperOrigin-RevId: 846663686
2025-12-19 03:43:15 -08:00
Krzysztof Rymski
b73a9ede8f
Internal changes
...
PiperOrigin-RevId: 846648337
2025-12-19 02:46:18 -08:00
Balazs Racz
0ac55f71ed
Avoid using Row() for unaligned storage.
...
PiperOrigin-RevId: 846214605
2025-12-18 05:10:57 -08:00
Krzysztof Rymski
6661d3a60c
Internal changes
...
PiperOrigin-RevId: 846140314
2025-12-18 01:26:43 -08:00
Liam Miller-Cushon
142e6a7e9c
No public description
...
PiperOrigin-RevId: 846030124
2025-12-17 20:10:54 -08:00
Phil Culliton
b8a409dbba
Use hn::Sub for vector subtraction in flash attention.
...
PiperOrigin-RevId: 845883321
2025-12-17 12:57:34 -08:00
Balazs Racz
596bdfe5af
Separate monolithic gemma_lib library into more specific cc_library targets.
...
Creates new cc_library targets for :attention, :tensor_stats and :activations. Eliminates cyclic dependencies between these libraries.
PiperOrigin-RevId: 845690136
2025-12-17 03:31:16 -08:00
Salman Chishti
a4c78d4454
Merge branch 'dev' into upgrade-github-actions-node24
2025-12-16 14:47:59 +00:00
Salman Muin Kayser Chishti
b66aa115ac
Upgrade GitHub Actions for Node 24 compatibility
2025-12-16 14:26:24 +00:00
Balazs Racz
baa69dfb78
Makes the entire runtime_config passed into the activations constructor.
...
PiperOrigin-RevId: 845153671
2025-12-16 01:56:52 -08:00
Krzysztof Rymski
44dfd69b9b
Internal changes
...
PiperOrigin-RevId: 844759322
2025-12-15 07:14:37 -08:00
Jan Wassenberg
0c64987a96
Abort if args are unrecognized, refactor argument passing
...
This catches typos/incorrect usage.
Refactor: group Loader/Threading/Inference into GemmaArgs.
All *Args ctors now have an extra ConsumedArgs& argument.
PiperOrigin-RevId: 844690553
2025-12-15 03:18:45 -08:00
Jan Wassenberg
f50550f4ce
Warning fixes (sign mismatch), switch default
...
PiperOrigin-RevId: 844679375
2025-12-15 02:41:19 -08:00
Martin Stolle
506fb22be7
No public description
...
PiperOrigin-RevId: 843665619
2025-12-12 06:37:17 -08:00
Balazs Racz
338cd8a36e
Factors out a new cc_library `:query` from `:gemma-lib`.
...
Moves query-related structs/classes to gemma/query.h.
This refactors PerQuery, AllQueries, and QBatch into a dedicated header file, gemma/query.h, and updates BUILD dependencies accordingly.
PiperOrigin-RevId: 843604293
2025-12-12 02:53:56 -08:00
Jan Wassenberg
73c3627b67
Add tensor stats and output
...
tensor_info: add missing header
io: fix mode
weights.h: add layer_idx to LayerWeightsPtrs
PiperOrigin-RevId: 843531051
2025-12-11 22:52:46 -08:00
Martin Stolle
bfc0dfcfca
Enable flags= parsing
...
PiperOrigin-RevId: 843103750
2025-12-11 01:17:59 -08:00
Martin Stolle
78deacc357
Make attention configurable on the command line.
...
PiperOrigin-RevId: 842760721
2025-12-10 09:34:06 -08:00
Martin Stolle
2441ff01bf
internal change
...
PiperOrigin-RevId: 842749037
2025-12-10 09:01:15 -08:00
Krzysztof Rymski
64178ace38
Internal changes
...
PiperOrigin-RevId: 842727112
2025-12-10 07:55:17 -08:00
Martin Stolle
9689fc82f9
internal change
...
PiperOrigin-RevId: 842205671
2025-12-09 06:17:08 -08:00
Krzysztof Rymski
64d700cab5
Internal changes
...
PiperOrigin-RevId: 842194766
2025-12-09 05:42:03 -08:00