Nikhil Dev Goyal
259b757aef
Use Lookup8 and detail::IsFull(d) in FastSigmoid
...
Fix targeted for scalable architectures
PiperOrigin-RevId: 888633434
2026-03-24 06:36:55 -07:00
Krzysztof Rymski
8a5e37eeb7
Updates to tests to use kv_transcodign library to reduce theris code size
...
PiperOrigin-RevId: 888600365
2026-03-24 05:06:01 -07:00
Jan Wassenberg
1dedcfd50d
Warning fix: cast enum for HWY_ABORT %d
...
PiperOrigin-RevId: 886242788
2026-03-19 10:11:17 -07:00
Jan Wassenberg
79f2bf7a07
Disable SVE (except SVE2_128) for MatMul due to compiler crash
...
PiperOrigin-RevId: 886190686
2026-03-19 08:24:18 -07:00
Nikhil Dev Goyal
90f3de7f15
Use paralell blend chain path in FastSigmoid on architectures having >=32 registers
...
PiperOrigin-RevId: 886178215
2026-03-19 07:54:05 -07:00
Nikhil Dev Goyal
50144738f1
Change calculation from (ax+b)/(cx+d) to (x + b')/(c'x+ d') this replaces a MulAdd with Add reducing port contention on modern cpus and thus increasing throughput.
...
Also reduces the need for 1 register to hold b as 1.0 here
PiperOrigin-RevId: 886170146
2026-03-19 07:36:52 -07:00
Jan Wassenberg
ceb70203f0
Add min_verbosity to MaybePrint
...
PiperOrigin-RevId: 886094998
2026-03-19 04:22:01 -07:00
Krzysztof Rymski
1a5226e5de
Utilities to convert between different encodings of kv cache
...
PiperOrigin-RevId: 885553004
2026-03-18 06:16:32 -07:00
Nikhil Dev Goyal
0110ddfee7
Fix testing::SrcDir() path resolution in wheat_from_chaff_test
...
Also use a list of acceptable substring matchers for each question instead of just one
PiperOrigin-RevId: 883198819
2026-03-13 09:17:31 -07:00
Jan Wassenberg
529c201eb6
Add/use MaybePrint; also ShowConfig in non-interactive builds
...
PiperOrigin-RevId: 882688835
2026-03-12 11:20:41 -07:00
Krzysztof Rymski
197c1a049c
Fix int8
...
PiperOrigin-RevId: 882611833
2026-03-12 08:43:18 -07:00
The gemma.cpp Authors
d6e836c651
Add phase markers to stderr for high verbosity levels.
...
This change introduces `[ BEGIN PHASE: ... ]` and `[ END PHASE: ... ]` messages printed to stderr when `timing_info.verbosity` is 2 or higher. These markers are added around the prefill, generate, image token generation, and final statistics phases to help in profiling and understanding the execution flow.
PiperOrigin-RevId: 882556076
2026-03-12 06:35:25 -07:00
Copybara-Service
e728d45d8e
Merge pull request #866 from salmanmkc:upgrade-github-actions-node24-general
...
PiperOrigin-RevId: 882555945
2026-03-12 06:34:32 -07:00
Jan Wassenberg
cab77f8dc7
Improved timing for image tokens
...
Move to TimingInfo, extra newline before profiler
PiperOrigin-RevId: 881943820
2026-03-11 04:47:56 -07:00
Salman Muin Kayser Chishti
3187ee0f85
Upgrade GitHub Actions to latest versions
...
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2026-03-11 11:40:58 +00:00
Jan Wassenberg
70cb9cf1c2
Separate profiler output for image token generation
...
PiperOrigin-RevId: 880895239
2026-03-09 09:26:50 -07:00
Ray Smith
bea8b1cdbd
Replaced attention in ViT with flash - 8x speedup of image tokenizer on AMD
...
PiperOrigin-RevId: 880877209
2026-03-09 08:46:04 -07:00
Krzysztof Rymski
029cfd0b33
Int8 + microscaling support for kv cache formats.
...
Right now multiplication is done by converting to corresponding float format.
Can yield up to 2x improvements for membw constrained shapes
PiperOrigin-RevId: 880748493
2026-03-09 02:50:08 -07:00
Ray Smith
d2806fb1dd
Fixed msan error by fixing padding of k_cache and v_cache
...
PiperOrigin-RevId: 879644219
2026-03-06 08:11:17 -08:00
Dani Ferreira Franco Moura
d6c7576024
internal change
...
PiperOrigin-RevId: 879546918
2026-03-06 03:47:11 -08:00
Jan Wassenberg
8d9b9925be
Fix VLM prefill batch size - prompt+tokens
...
PiperOrigin-RevId: 879159709
2026-03-05 11:21:55 -08:00
Nikhil Dev Goyal
5081341200
Use CappedTag to prevent potential out of bound reads.
...
PiperOrigin-RevId: 879141747
2026-03-05 10:40:52 -08:00
Ray Smith
79e640a956
Fixed tsan error.
...
PiperOrigin-RevId: 879069355
2026-03-05 07:59:38 -08:00
Nikhil Dev Goyal
6721dddf38
Implement FastSigmoid.
...
PiperOrigin-RevId: 878453196
2026-03-04 06:12:33 -08:00
Krzysztof Rymski
539d9bb8e7
Change to use faster exponent function
...
PiperOrigin-RevId: 877981568
2026-03-03 09:16:04 -08:00
Ray Smith
49cb438b1e
Rollback of erroneous rollback.
...
PiperOrigin-RevId: 877376165
2026-03-02 06:50:26 -08:00
Jan Wassenberg
fbd44cee42
Fix Windows warnings
...
PiperOrigin-RevId: 877338937
2026-03-02 04:53:25 -08:00
The gemma.cpp Authors
a3d994915f
No public description
...
PiperOrigin-RevId: 877333188
2026-03-02 04:32:29 -08:00
Ray Smith
16c1b29b89
Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
...
PiperOrigin-RevId: 877308306
2026-03-02 03:11:01 -08:00
Miguel Lobo
f7f5fd5863
Add ability to load custom models which are fully described by the ModelConfig blob.
...
PiperOrigin-RevId: 877265257
2026-03-02 01:18:33 -08:00
Nikhil Dev Goyal
dd268ddbe8
Add FastGelu activation function in a newly created created fast_ops-inl.h files.
...
This replaces the Tanh call with FastTanh call in the Gelu function written in math-inl.h.
PiperOrigin-RevId: 876339830
2026-02-27 11:14:47 -08:00
Krzysztof Rymski
bdba3bfa63
remove const to fix windows builds
...
PiperOrigin-RevId: 876232691
2026-02-27 06:56:54 -08:00
Viktor Shipitsin
d8a123e4ec
Use a struct to manage the mapping between `AttentionImpl` enum values and their string names, simplifying `GetAttentionImplName` function. Add a test to ensure all valid `AttentionImpl` enums have a corresponding name and can be looked up.
...
PiperOrigin-RevId: 876124604
2026-02-27 01:31:11 -08:00
Jan Wassenberg
c6587efe70
Improve instrumentation for ViT parts
...
PiperOrigin-RevId: 875302990
2026-02-25 13:10:44 -08:00
Krzysztof Rymski
df162ead7c
Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.
...
It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512
PiperOrigin-RevId: 874517319
2026-02-24 03:26:49 -08:00
Jan Wassenberg
463a3682be
Internal change
...
PiperOrigin-RevId: 874097322
2026-02-23 08:55:33 -08:00
Krzysztof Rymski
7dc98902d3
Internal changes
...
PiperOrigin-RevId: 872280443
2026-02-19 01:57:58 -08:00
The gemma.cpp Authors
34739fd9f0
Internal changes
...
PiperOrigin-RevId: 871792281
2026-02-18 04:07:36 -08:00
Krzysztof Rymski
c6696342fa
Internal changes
...
PiperOrigin-RevId: 871776998
2026-02-18 03:21:41 -08:00
Ray Smith
76d7951242
Added wheat_from_chaff_test to test the ability of a model to find a needle in a haystack of data.
...
Replaced flag with attention_impl to control which attention to run.
PiperOrigin-RevId: 869694868
2026-02-13 06:05:30 -08:00
Krzysztof Rymski
7e5310b908
Internal changes
...
PiperOrigin-RevId: 867617121
2026-02-09 08:29:15 -08:00
Jan Wassenberg
56fa6e4839
Internal change plus add U8 type, check MatPtrT type at compile time
...
PiperOrigin-RevId: 867582875
2026-02-09 06:54:11 -08:00
Hana Joo
7c19b31c66
Automated Code Change
...
PiperOrigin-RevId: 865932055
2026-02-05 07:16:24 -08:00
Jan Wassenberg
2751a194be
Fix paligemma: must subtract image tokens from prompt length
...
PiperOrigin-RevId: 865905454
2026-02-05 05:59:36 -08:00
Krzysztof Rymski
60eed010ba
Internal changes
...
PiperOrigin-RevId: 862680527
2026-01-29 04:48:29 -08:00
Krzysztof Rymski
16a7ba2d6e
Internal changes
...
PiperOrigin-RevId: 854171429
2026-01-09 06:35:36 -08:00
Jan Wassenberg
6d43d6ee19
Build fix for Arm SVE (invalid template arg on op)
...
PiperOrigin-RevId: 854110884
2026-01-09 02:56:03 -08:00
The gemma.cpp Authors
95592a574e
Build fix for Arm SVE (explicit namespace qualification)
...
PiperOrigin-RevId: 853864585
2026-01-08 13:29:45 -08:00
Jan Wassenberg
42e9cf557d
Internal change / remove unused PrintSpeed
...
PiperOrigin-RevId: 853694463
2026-01-08 05:26:31 -08:00
Balazs Racz
384c390181
Allow overriding hardcoded max_seq_len by cmdline argument seq_len.
...
Adds a SetMaxSeqLen method to ModelConfig to handle updating both max_seq_len and global attention window sizes. The Gemma constructor now checks if the provided inference seq_len exceeds the model's max_seq_len and, if so, emits a warning and updates the config.
This prevents clipping context to the hard-coded maximum.
PiperOrigin-RevId: 853676074
2026-01-08 04:28:59 -08:00