Commit Graph

932 Commits

Author SHA1 Message Date
Jan Wassenberg 26bf1e16dc Remove old attention, superseded by flash
PiperOrigin-RevId: 900671724
2026-04-16 05:36:11 -07:00
Ray Smith 221d8df516 Increased max_tbatch_size to kMaxBatchSize. Gives 1.5x speed-up for prefill on both intel and AMD machines
Shrank intermediate arrays used in matmul to reduce memory use.

PiperOrigin-RevId: 899579842
2026-04-14 07:36:44 -07:00
Ray Smith a29e2fc655 Fixed bug in PackedBytes - not using override_rows.
PiperOrigin-RevId: 896518708
2026-04-08 08:34:11 -07:00
Copybara-Service 366b143fbf Merge pull request #891 from texasich:clean-gcc15-fix
PiperOrigin-RevId: 895822257
2026-04-07 04:32:10 -07:00
Nikhil Dev Goyal 70513a1e0f Use FastExpMinusOrZero in Softmax().
PiperOrigin-RevId: 895740071
2026-04-07 01:19:39 -07:00
Nikhil Dev Goyal f01cc59218 Reformat
PiperOrigin-RevId: 895729770
2026-04-07 00:56:30 -07:00
texasich f62856eccd Refactor build workflow to improve GCC 15 support and streamline job configurations 2026-04-05 20:20:02 -05:00
texasich 35274a787a Add artifact archiving step to build workflow 2026-04-05 20:15:44 -05:00
texasich 9d912dc2b6 Add support for GCC 15 by disabling AVX10.2 target in Highway 2026-04-05 20:10:56 -05:00
texasich 276052470e Update sentencepiece dependency to latest commit for improved compatibility 2026-04-05 20:10:56 -05:00
Jan Wassenberg 3892763e4e Use HWY_MEMBER_VAR_MAYBE_UNUSED for members
PiperOrigin-RevId: 893428331
2026-04-02 04:19:42 -07:00
Nikhil Dev Goyal 8d2fcb3f12 Replace remaining occurrences of Exp with FastExpMinusOrZero in flash attention.
PiperOrigin-RevId: 891691155
2026-03-30 06:44:56 -07:00
Copybara-Service 0da94e5035 Merge pull request #833 from brendandahl:emscripten-cmake
PiperOrigin-RevId: 890530097
2026-03-27 10:40:21 -07:00
Brendan Dahl b652581bcd Support building with Emscripten
Update CMake configuration and utility functions to enable compilation
with Emscripten. This includes setting Wasm-specific flags like
memory64 and SIMD, implementing platform-specific memory detection, and
adding guards for features like OpenSSL that may be unavailable in a
web environment.
2026-03-27 17:03:35 +00:00
Brendan Dahl 20f2570c96 Fix namespace references in api_client.cc
Qualify color constants and APIClient with the gcpp namespace in
gemma/api_client.cc to resolve potential symbol lookup issues.
2026-03-27 17:01:05 +00:00
Krzysztof Rymski 2344488566 Internal changes
PiperOrigin-RevId: 889294548
2026-03-25 09:46:12 -07:00
Jan Wassenberg c0064bdd6b Warning fix (size_t vs u64 in format string)
PiperOrigin-RevId: 889180151
2026-03-25 05:12:49 -07:00
Krzysztof Rymski f56d18dd68 Improvements to inference using int8 compressed kv's
Multiplication is done using int16*int16 multiplication instructions avoid expensive conversion to f32/bf16
x2 speed on zen3

PiperOrigin-RevId: 888690192
2026-03-24 08:51:30 -07:00
Nikhil Dev Goyal 259b757aef Use Lookup8 and detail::IsFull(d) in FastSigmoid
Fix targeted for scalable architectures

PiperOrigin-RevId: 888633434
2026-03-24 06:36:55 -07:00
Krzysztof Rymski 8a5e37eeb7 Updates to tests to use kv_transcodign library to reduce theris code size
PiperOrigin-RevId: 888600365
2026-03-24 05:06:01 -07:00
Jan Wassenberg 1dedcfd50d Warning fix: cast enum for HWY_ABORT %d
PiperOrigin-RevId: 886242788
2026-03-19 10:11:17 -07:00
Jan Wassenberg 79f2bf7a07 Disable SVE (except SVE2_128) for MatMul due to compiler crash
PiperOrigin-RevId: 886190686
2026-03-19 08:24:18 -07:00
Nikhil Dev Goyal 90f3de7f15 Use paralell blend chain path in FastSigmoid on architectures having >=32 registers
PiperOrigin-RevId: 886178215
2026-03-19 07:54:05 -07:00
Nikhil Dev Goyal 50144738f1 Change calculation from (ax+b)/(cx+d) to (x + b')/(c'x+ d') this replaces a MulAdd with Add reducing port contention on modern cpus and thus increasing throughput.
Also reduces the need for 1 register to hold b as 1.0 here

PiperOrigin-RevId: 886170146
2026-03-19 07:36:52 -07:00
Jan Wassenberg ceb70203f0 Add min_verbosity to MaybePrint
PiperOrigin-RevId: 886094998
2026-03-19 04:22:01 -07:00
Krzysztof Rymski 1a5226e5de Utilities to convert between different encodings of kv cache
PiperOrigin-RevId: 885553004
2026-03-18 06:16:32 -07:00
Nikhil Dev Goyal 0110ddfee7 Fix testing::SrcDir() path resolution in wheat_from_chaff_test
Also use a list of acceptable substring matchers for each question instead of just one

PiperOrigin-RevId: 883198819
2026-03-13 09:17:31 -07:00
Jan Wassenberg 529c201eb6 Add/use MaybePrint; also ShowConfig in non-interactive builds
PiperOrigin-RevId: 882688835
2026-03-12 11:20:41 -07:00
Krzysztof Rymski 197c1a049c Fix int8
PiperOrigin-RevId: 882611833
2026-03-12 08:43:18 -07:00
The gemma.cpp Authors d6e836c651 Add phase markers to stderr for high verbosity levels.
This change introduces `[ BEGIN PHASE: ... ]` and `[ END PHASE: ... ]` messages printed to stderr when `timing_info.verbosity` is 2 or higher. These markers are added around the prefill, generate, image token generation, and final statistics phases to help in profiling and understanding the execution flow.

PiperOrigin-RevId: 882556076
2026-03-12 06:35:25 -07:00
Copybara-Service e728d45d8e Merge pull request #866 from salmanmkc:upgrade-github-actions-node24-general
PiperOrigin-RevId: 882555945
2026-03-12 06:34:32 -07:00
Jan Wassenberg cab77f8dc7 Improved timing for image tokens
Move to TimingInfo, extra newline before profiler

PiperOrigin-RevId: 881943820
2026-03-11 04:47:56 -07:00
Salman Muin Kayser Chishti 3187ee0f85
Upgrade GitHub Actions to latest versions
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
2026-03-11 11:40:58 +00:00
Jan Wassenberg 70cb9cf1c2 Separate profiler output for image token generation
PiperOrigin-RevId: 880895239
2026-03-09 09:26:50 -07:00
Ray Smith bea8b1cdbd Replaced attention in ViT with flash - 8x speedup of image tokenizer on AMD
PiperOrigin-RevId: 880877209
2026-03-09 08:46:04 -07:00
Krzysztof Rymski 029cfd0b33 Int8 + microscaling support for kv cache formats.
Right now multiplication is done by converting to corresponding float format.
Can yield up to 2x improvements for membw constrained shapes

PiperOrigin-RevId: 880748493
2026-03-09 02:50:08 -07:00
Ray Smith d2806fb1dd Fixed msan error by fixing padding of k_cache and v_cache
PiperOrigin-RevId: 879644219
2026-03-06 08:11:17 -08:00
Dani Ferreira Franco Moura d6c7576024 internal change
PiperOrigin-RevId: 879546918
2026-03-06 03:47:11 -08:00
Jan Wassenberg 8d9b9925be Fix VLM prefill batch size - prompt+tokens
PiperOrigin-RevId: 879159709
2026-03-05 11:21:55 -08:00
Nikhil Dev Goyal 5081341200 Use CappedTag to prevent potential out of bound reads.
PiperOrigin-RevId: 879141747
2026-03-05 10:40:52 -08:00
Ray Smith 79e640a956 Fixed tsan error.
PiperOrigin-RevId: 879069355
2026-03-05 07:59:38 -08:00
Nikhil Dev Goyal 6721dddf38 Implement FastSigmoid.
PiperOrigin-RevId: 878453196
2026-03-04 06:12:33 -08:00
Krzysztof Rymski 539d9bb8e7 Change to use faster exponent function
PiperOrigin-RevId: 877981568
2026-03-03 09:16:04 -08:00
Ray Smith 49cb438b1e Rollback of erroneous rollback.
PiperOrigin-RevId: 877376165
2026-03-02 06:50:26 -08:00
Jan Wassenberg fbd44cee42 Fix Windows warnings
PiperOrigin-RevId: 877338937
2026-03-02 04:53:25 -08:00
The gemma.cpp Authors a3d994915f No public description
PiperOrigin-RevId: 877333188
2026-03-02 04:32:29 -08:00
Ray Smith 16c1b29b89 Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
PiperOrigin-RevId: 877308306
2026-03-02 03:11:01 -08:00
Miguel Lobo f7f5fd5863 Add ability to load custom models which are fully described by the ModelConfig blob.
PiperOrigin-RevId: 877265257
2026-03-02 01:18:33 -08:00
Nikhil Dev Goyal dd268ddbe8 Add FastGelu activation function in a newly created created fast_ops-inl.h files.
This replaces the Tanh call with FastTanh call in the Gelu function written in math-inl.h.

PiperOrigin-RevId: 876339830
2026-02-27 11:14:47 -08:00
Krzysztof Rymski bdba3bfa63 remove const to fix windows builds
PiperOrigin-RevId: 876232691
2026-02-27 06:56:54 -08:00