Jan Wassenberg
26bf1e16dc
Remove old attention, superseded by flash
...
PiperOrigin-RevId: 900671724
2026-04-16 05:36:11 -07:00
Ray Smith
221d8df516
Increased max_tbatch_size to kMaxBatchSize. Gives 1.5x speed-up for prefill on both intel and AMD machines
...
Shrank intermediate arrays used in matmul to reduce memory use.
PiperOrigin-RevId: 899579842
2026-04-14 07:36:44 -07:00
Ray Smith
a29e2fc655
Fixed bug in PackedBytes - not using override_rows.
...
PiperOrigin-RevId: 896518708
2026-04-08 08:34:11 -07:00
Copybara-Service
366b143fbf
Merge pull request #891 from texasich:clean-gcc15-fix
...
PiperOrigin-RevId: 895822257
2026-04-07 04:32:10 -07:00
Nikhil Dev Goyal
70513a1e0f
Use FastExpMinusOrZero in Softmax().
...
PiperOrigin-RevId: 895740071
2026-04-07 01:19:39 -07:00
Nikhil Dev Goyal
f01cc59218
Reformat
...
PiperOrigin-RevId: 895729770
2026-04-07 00:56:30 -07:00
texasich
f62856eccd
Refactor build workflow to improve GCC 15 support and streamline job configurations
2026-04-05 20:20:02 -05:00
texasich
35274a787a
Add artifact archiving step to build workflow
2026-04-05 20:15:44 -05:00
texasich
9d912dc2b6
Add support for GCC 15 by disabling AVX10.2 target in Highway
2026-04-05 20:10:56 -05:00
texasich
276052470e
Update sentencepiece dependency to latest commit for improved compatibility
2026-04-05 20:10:56 -05:00
Jan Wassenberg
3892763e4e
Use HWY_MEMBER_VAR_MAYBE_UNUSED for members
...
PiperOrigin-RevId: 893428331
2026-04-02 04:19:42 -07:00
Nikhil Dev Goyal
8d2fcb3f12
Replace remaining occurrences of Exp with FastExpMinusOrZero in flash attention.
...
PiperOrigin-RevId: 891691155
2026-03-30 06:44:56 -07:00
Copybara-Service
0da94e5035
Merge pull request #833 from brendandahl:emscripten-cmake
...
PiperOrigin-RevId: 890530097
2026-03-27 10:40:21 -07:00
Brendan Dahl
b652581bcd
Support building with Emscripten
...
Update CMake configuration and utility functions to enable compilation
with Emscripten. This includes setting Wasm-specific flags like
memory64 and SIMD, implementing platform-specific memory detection, and
adding guards for features like OpenSSL that may be unavailable in a
web environment.
2026-03-27 17:03:35 +00:00
Brendan Dahl
20f2570c96
Fix namespace references in api_client.cc
...
Qualify color constants and APIClient with the gcpp namespace in
gemma/api_client.cc to resolve potential symbol lookup issues.
2026-03-27 17:01:05 +00:00
Krzysztof Rymski
2344488566
Internal changes
...
PiperOrigin-RevId: 889294548
2026-03-25 09:46:12 -07:00
Jan Wassenberg
c0064bdd6b
Warning fix (size_t vs u64 in format string)
...
PiperOrigin-RevId: 889180151
2026-03-25 05:12:49 -07:00
Krzysztof Rymski
f56d18dd68
Improvements to inference using int8 compressed kv's
...
Multiplication is done using int16*int16 multiplication instructions avoid expensive conversion to f32/bf16
x2 speed on zen3
PiperOrigin-RevId: 888690192
2026-03-24 08:51:30 -07:00
Nikhil Dev Goyal
259b757aef
Use Lookup8 and detail::IsFull(d) in FastSigmoid
...
Fix targeted for scalable architectures
PiperOrigin-RevId: 888633434
2026-03-24 06:36:55 -07:00
Krzysztof Rymski
8a5e37eeb7
Updates to tests to use kv_transcodign library to reduce theris code size
...
PiperOrigin-RevId: 888600365
2026-03-24 05:06:01 -07:00
Jan Wassenberg
1dedcfd50d
Warning fix: cast enum for HWY_ABORT %d
...
PiperOrigin-RevId: 886242788
2026-03-19 10:11:17 -07:00
Jan Wassenberg
79f2bf7a07
Disable SVE (except SVE2_128) for MatMul due to compiler crash
...
PiperOrigin-RevId: 886190686
2026-03-19 08:24:18 -07:00
Nikhil Dev Goyal
90f3de7f15
Use paralell blend chain path in FastSigmoid on architectures having >=32 registers
...
PiperOrigin-RevId: 886178215
2026-03-19 07:54:05 -07:00
Nikhil Dev Goyal
50144738f1
Change calculation from (ax+b)/(cx+d) to (x + b')/(c'x+ d') this replaces a MulAdd with Add reducing port contention on modern cpus and thus increasing throughput.
...
Also reduces the need for 1 register to hold b as 1.0 here
PiperOrigin-RevId: 886170146
2026-03-19 07:36:52 -07:00
Jan Wassenberg
ceb70203f0
Add min_verbosity to MaybePrint
...
PiperOrigin-RevId: 886094998
2026-03-19 04:22:01 -07:00
Krzysztof Rymski
1a5226e5de
Utilities to convert between different encodings of kv cache
...
PiperOrigin-RevId: 885553004
2026-03-18 06:16:32 -07:00
Nikhil Dev Goyal
0110ddfee7
Fix testing::SrcDir() path resolution in wheat_from_chaff_test
...
Also use a list of acceptable substring matchers for each question instead of just one
PiperOrigin-RevId: 883198819
2026-03-13 09:17:31 -07:00
Jan Wassenberg
529c201eb6
Add/use MaybePrint; also ShowConfig in non-interactive builds
...
PiperOrigin-RevId: 882688835
2026-03-12 11:20:41 -07:00
Krzysztof Rymski
197c1a049c
Fix int8
...
PiperOrigin-RevId: 882611833
2026-03-12 08:43:18 -07:00
The gemma.cpp Authors
d6e836c651
Add phase markers to stderr for high verbosity levels.
...
This change introduces `[ BEGIN PHASE: ... ]` and `[ END PHASE: ... ]` messages printed to stderr when `timing_info.verbosity` is 2 or higher. These markers are added around the prefill, generate, image token generation, and final statistics phases to help in profiling and understanding the execution flow.
PiperOrigin-RevId: 882556076
2026-03-12 06:35:25 -07:00
Copybara-Service
e728d45d8e
Merge pull request #866 from salmanmkc:upgrade-github-actions-node24-general
...
PiperOrigin-RevId: 882555945
2026-03-12 06:34:32 -07:00
Jan Wassenberg
cab77f8dc7
Improved timing for image tokens
...
Move to TimingInfo, extra newline before profiler
PiperOrigin-RevId: 881943820
2026-03-11 04:47:56 -07:00
Salman Muin Kayser Chishti
3187ee0f85
Upgrade GitHub Actions to latest versions
...
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2026-03-11 11:40:58 +00:00
Jan Wassenberg
70cb9cf1c2
Separate profiler output for image token generation
...
PiperOrigin-RevId: 880895239
2026-03-09 09:26:50 -07:00
Ray Smith
bea8b1cdbd
Replaced attention in ViT with flash - 8x speedup of image tokenizer on AMD
...
PiperOrigin-RevId: 880877209
2026-03-09 08:46:04 -07:00
Krzysztof Rymski
029cfd0b33
Int8 + microscaling support for kv cache formats.
...
Right now multiplication is done by converting to corresponding float format.
Can yield up to 2x improvements for membw constrained shapes
PiperOrigin-RevId: 880748493
2026-03-09 02:50:08 -07:00
Ray Smith
d2806fb1dd
Fixed msan error by fixing padding of k_cache and v_cache
...
PiperOrigin-RevId: 879644219
2026-03-06 08:11:17 -08:00
Dani Ferreira Franco Moura
d6c7576024
internal change
...
PiperOrigin-RevId: 879546918
2026-03-06 03:47:11 -08:00
Jan Wassenberg
8d9b9925be
Fix VLM prefill batch size - prompt+tokens
...
PiperOrigin-RevId: 879159709
2026-03-05 11:21:55 -08:00
Nikhil Dev Goyal
5081341200
Use CappedTag to prevent potential out of bound reads.
...
PiperOrigin-RevId: 879141747
2026-03-05 10:40:52 -08:00
Ray Smith
79e640a956
Fixed tsan error.
...
PiperOrigin-RevId: 879069355
2026-03-05 07:59:38 -08:00
Nikhil Dev Goyal
6721dddf38
Implement FastSigmoid.
...
PiperOrigin-RevId: 878453196
2026-03-04 06:12:33 -08:00
Krzysztof Rymski
539d9bb8e7
Change to use faster exponent function
...
PiperOrigin-RevId: 877981568
2026-03-03 09:16:04 -08:00
Ray Smith
49cb438b1e
Rollback of erroneous rollback.
...
PiperOrigin-RevId: 877376165
2026-03-02 06:50:26 -08:00
Jan Wassenberg
fbd44cee42
Fix Windows warnings
...
PiperOrigin-RevId: 877338937
2026-03-02 04:53:25 -08:00
The gemma.cpp Authors
a3d994915f
No public description
...
PiperOrigin-RevId: 877333188
2026-03-02 04:32:29 -08:00
Ray Smith
16c1b29b89
Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
...
PiperOrigin-RevId: 877308306
2026-03-02 03:11:01 -08:00
Miguel Lobo
f7f5fd5863
Add ability to load custom models which are fully described by the ModelConfig blob.
...
PiperOrigin-RevId: 877265257
2026-03-02 01:18:33 -08:00
Nikhil Dev Goyal
dd268ddbe8
Add FastGelu activation function in a newly created created fast_ops-inl.h files.
...
This replaces the Tanh call with FastTanh call in the Gelu function written in math-inl.h.
PiperOrigin-RevId: 876339830
2026-02-27 11:14:47 -08:00
Krzysztof Rymski
bdba3bfa63
remove const to fix windows builds
...
PiperOrigin-RevId: 876232691
2026-02-27 06:56:54 -08:00