Commit Graph

  • c0064bdd6b Warning fix (size_t vs u64 in format string) dev Jan Wassenberg 2026-03-25 05:12:19 -0700
  • 3cc9ec1177
    Merge 7efeb4fe06 into f56d18dd68 copybara-service[bot] 2026-03-25 12:02:06 +0000
  • 7efeb4fe06 Internal changes test_888724073 Krzysztof Rymski 2026-03-24 10:02:02 -0700
  • f56d18dd68 Improvements to inference using int8 compressed kv's Multiplication is done using int16*int16 multiplication instructions avoid expensive conversion to f32/bf16 x2 speed on zen3 Krzysztof Rymski 2026-03-24 08:50:48 -0700
  • 259b757aef Use Lookup8 and detail::IsFull(d) in FastSigmoid Fix targeted for scalable architectures Nikhil Dev Goyal 2026-03-24 06:36:37 -0700
  • 8a5e37eeb7 Updates to tests to use kv_transcodign library to reduce theris code size Krzysztof Rymski 2026-03-24 05:05:35 -0700
  • 1dedcfd50d Warning fix: cast enum for HWY_ABORT %d Jan Wassenberg 2026-03-19 10:10:57 -0700
  • 79f2bf7a07 Disable SVE (except SVE2_128) for MatMul due to compiler crash Jan Wassenberg 2026-03-19 08:20:17 -0700
  • 90f3de7f15 Use paralell blend chain path in FastSigmoid on architectures having >=32 registers Nikhil Dev Goyal 2026-03-19 07:53:40 -0700
  • 50144738f1 Change calculation from (ax+b)/(cx+d) to (x + b')/(c'x+ d') this replaces a MulAdd with Add reducing port contention on modern cpus and thus increasing throughput. Also reduces the need for 1 register to hold b as 1.0 here Nikhil Dev Goyal 2026-03-19 07:36:28 -0700
  • ceb70203f0 Add min_verbosity to MaybePrint Jan Wassenberg 2026-03-19 04:21:24 -0700
  • 1a5226e5de Utilities to convert between different encodings of kv cache Krzysztof Rymski 2026-03-18 06:15:57 -0700
  • 0110ddfee7 Fix testing::SrcDir() path resolution in wheat_from_chaff_test Also use a list of acceptable substring matchers for each question instead of just one Nikhil Dev Goyal 2026-03-13 09:17:14 -0700
  • f78c956ecb
    Merge 139a8e0964 into 529c201eb6 copybara-service[bot] 2026-03-13 15:54:09 +0000
  • 139a8e0964 Replace remaining occurrences of Exp with FastExpMinusOrZero in flash attention. test_882817324 Nikhil Dev Goyal 2026-03-12 15:47:51 -0700
  • 529c201eb6 Add/use MaybePrint; also ShowConfig in non-interactive builds Jan Wassenberg 2026-03-12 11:20:13 -0700
  • 197c1a049c Fix int8 Krzysztof Rymski 2026-03-12 08:42:56 -0700
  • d6e836c651 Add phase markers to stderr for high verbosity levels. The gemma.cpp Authors 2026-03-12 06:34:19 -0700
  • e728d45d8e Merge pull request #866 from salmanmkc:upgrade-github-actions-node24-general Copybara-Service 2026-03-12 06:34:32 -0700
  • 7b693286d4
    Upgrade GitHub Actions for Node 24 compatibility Salman Muin Kayser Chishti 2026-03-11 23:00:49 +0000
  • de3a1f2291
    Merge 9772586644 into cab77f8dc7 Salman Chishti 2026-03-11 21:48:17 +0000
  • 9772586644
    Upgrade GitHub Actions for Node 24 compatibility Salman Muin Kayser Chishti 2026-03-11 21:48:13 +0000
  • aafb2a98bd
    Merge 0e0ae5a910 into cab77f8dc7 copybara-service[bot] 2026-03-11 19:28:44 +0000
  • 0e0ae5a910 Internal Change test_882132164 The gemma.cpp Authors 2026-03-11 12:11:12 -0700
  • cab77f8dc7 Improved timing for image tokens Jan Wassenberg 2026-03-11 04:47:36 -0700
  • 3187ee0f85
    Upgrade GitHub Actions to latest versions Salman Muin Kayser Chishti 2026-03-11 11:11:48 +0000
  • 70cb9cf1c2 Separate profiler output for image token generation Jan Wassenberg 2026-03-09 09:26:29 -0700
  • bea8b1cdbd Replaced attention in ViT with flash - 8x speedup of image tokenizer on AMD Ray Smith 2026-03-09 08:45:29 -0700
  • 968a5aa87f Change to FastExpMinusOrZero Nikhil Dev Goyal 2026-03-09 03:25:28 -0700
  • 029cfd0b33 Int8 + microscaling support for kv cache formats. Right now multiplication is done by converting to corresponding float format. Can yield up to 2x improvements for membw constrained shapes Krzysztof Rymski 2026-03-09 02:49:47 -0700
  • be511554a9 Fixed msan error by fixing padding of k_cache and v_cache The gemma.cpp Authors 2026-03-07 03:17:44 -0800
  • d2806fb1dd Fixed msan error by fixing padding of k_cache and v_cache Ray Smith 2026-03-06 08:10:47 -0800
  • d6c7576024 internal change Dani Ferreira Franco Moura 2026-03-06 03:46:43 -0800
  • 8d9b9925be Fix VLM prefill batch size - prompt+tokens Jan Wassenberg 2026-03-05 11:19:22 -0800
  • 5081341200 Use CappedTag to prevent potential out of bound reads. Nikhil Dev Goyal 2026-03-05 10:40:22 -0800
  • 79e640a956 Fixed tsan error. Ray Smith 2026-03-05 07:59:16 -0800
  • f9f2d909ed Automated Code Change The gemma.cpp Authors 2026-03-05 02:22:47 -0800
  • 6721dddf38 Implement FastSigmoid. Nikhil Dev Goyal 2026-03-04 06:12:14 -0800
  • 539d9bb8e7 Change to use faster exponent function Krzysztof Rymski 2026-03-03 09:15:42 -0800
  • dc1fb77356
    Merge 94648da6f2 into 49cb438b1e Brendan Dahl 2026-03-03 07:53:03 +0100
  • 49cb438b1e Rollback of erroneous rollback. Ray Smith 2026-03-02 06:50:03 -0800
  • fbd44cee42 Fix Windows warnings Jan Wassenberg 2026-03-02 04:53:02 -0800
  • a3d994915f No public description The gemma.cpp Authors 2026-03-02 04:31:45 -0800
  • 16c1b29b89 Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. Ray Smith 2026-03-02 03:10:40 -0800
  • 43771bd811
    Merge 0bce91c4f4 into f7f5fd5863 copybara-service[bot] 2026-03-02 11:00:10 +0000
  • f7f5fd5863 Add ability to load custom models which are fully described by the ModelConfig blob. Miguel Lobo 2026-03-02 01:17:58 -0800
  • 972100e6b3
    Merge e0b912fc46 into 3ed403e287 Sascha Ronnie Daoudia 2026-02-28 23:56:21 +0000
  • dd268ddbe8 Add FastGelu activation function in a newly created created fast_ops-inl.h files. This replaces the Tanh call with FastTanh call in the Gelu function written in math-inl.h. Nikhil Dev Goyal 2026-02-27 11:14:26 -0800
  • bdba3bfa63 remove const to fix windows builds Krzysztof Rymski 2026-02-27 06:56:36 -0800
  • d8a123e4ec Use a struct to manage the mapping between `AttentionImpl` enum values and their string names, simplifying `GetAttentionImplName` function. Add a test to ensure all valid `AttentionImpl` enums have a corresponding name and can be looked up. Viktor Shipitsin 2026-02-27 01:30:40 -0800
  • 0bce91c4f4 Change to use faster exponent function test_875649021 Krzysztof Rymski 2026-02-26 04:18:00 -0800
  • c6587efe70 Improve instrumentation for ViT parts Jan Wassenberg 2026-02-25 13:10:20 -0800
  • df162ead7c Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models. It also supports better parallelism for small batch sizes / small models. It also is able to utilize VDPBF16PS for nice 2x improvement on avx512 Krzysztof Rymski 2026-02-24 03:26:23 -0800
  • 463a3682be Internal change Jan Wassenberg 2026-02-23 08:55:04 -0800
  • 7dc98902d3 Internal changes Krzysztof Rymski 2026-02-19 01:57:27 -0800
  • 34739fd9f0 Internal changes The gemma.cpp Authors 2026-02-18 04:06:59 -0800
  • b48dd686e6 Undo comment of the test Krzysztof Rymski 2026-02-18 03:28:28 -0800
  • c6696342fa Internal changes Krzysztof Rymski 2026-02-18 03:21:03 -0800
  • 76d7951242 Added wheat_from_chaff_test to test the ability of a model to find a needle in a haystack of data. Replaced flag with attention_impl to control which attention to run. Ray Smith 2026-02-13 06:04:41 -0800
  • 94648da6f2 Support building with Emscripten Brendan Dahl 2026-02-12 00:42:38 +0000
  • a649bc3557 Fix namespace references in api_client.cc Brendan Dahl 2026-02-12 00:36:14 +0000
  • 7e5310b908 Internal changes Krzysztof Rymski 2026-02-09 08:28:46 -0800
  • 56fa6e4839 Internal change plus add U8 type, check MatPtrT type at compile time Jan Wassenberg 2026-02-09 06:53:40 -0800
  • 7c19b31c66 Automated Code Change Hana Joo 2026-02-05 07:16:04 -0800
  • 2751a194be Fix paligemma: must subtract image tokens from prompt length Jan Wassenberg 2026-02-05 05:59:14 -0800
  • 60eed010ba Internal changes Krzysztof Rymski 2026-01-29 04:47:35 -0800
  • ca6d5a88dd build: update CMake paths for io relocation Olamiposi Otesile 2026-01-13 23:55:35 +0100
  • c3c1ed7f00 fix: update header include paths in C++ files Olamiposi Otesile 2026-01-13 23:33:02 +0100
  • 4c56598b74 ci: update bazel build target to gemma_main Olamiposi Otesile 2026-01-13 23:22:13 +0100
  • ec105435bd fix: a global update of all io paths to gemma/io Olamiposi Otesile 2026-01-13 18:55:07 +0100
  • 1c5f712672 update all project references to the new gemma/io path Olamiposi Otesile 2026-01-13 18:50:36 +0100
  • a9ab913196 move io folder inside gemma directory Olamiposi Otesile 2026-01-13 18:34:29 +0100
  • 8d3682d1d3 finalize io path migration Olamiposi Otesile 2026-01-13 17:32:03 +0100
  • a0bb7b5527
    Merge branch 'dev' into main Ola Otesile 2026-01-12 06:18:51 -0800
  • b99790450c Restored original filenames. kept BlobReader to BlobFinder class rename Olamiposi Otesile 2026-01-06 22:04:30 +0100
  • 16a7ba2d6e Internal changes Krzysztof Rymski 2026-01-09 06:35:05 -0800
  • 6d43d6ee19 Build fix for Arm SVE (invalid template arg on op) Jan Wassenberg 2026-01-09 02:55:28 -0800
  • 95592a574e Build fix for Arm SVE (explicit namespace qualification) The gemma.cpp Authors 2026-01-08 13:29:15 -0800
  • 42e9cf557d Internal change / remove unused PrintSpeed Jan Wassenberg 2026-01-08 05:25:54 -0800
  • 384c390181 Allow overriding hardcoded max_seq_len by cmdline argument seq_len. Balazs Racz 2026-01-08 04:28:32 -0800
  • aeade052c6 Move AssertClose to test_util, add U16 Jan Wassenberg 2026-01-07 10:32:44 -0800
  • 2ee1fac74c Internal changes Krzysztof Rymski 2026-01-07 01:21:02 -0800
  • 5579abb4e6 Merge remote-tracking branch 'upstream/dev' Olamiposi Otesile 2026-01-02 22:35:58 +0100
  • 733bbddb7a Refactor: Rename BlobReader to BlobFinder Olamiposi Otesile 2025-12-26 13:48:49 +0100
  • 1605925d1e Add int8 quantization stats Jan Wassenberg 2025-12-19 12:42:29 -0800
  • 11aa16a13d Merge pull request #810 from salmanmkc:upgrade-github-actions-node24 Copybara-Service 2025-12-19 05:27:14 -0800
  • 08a0760271 Internal changes Krzysztof Rymski 2025-12-19 03:42:36 -0800
  • b73a9ede8f Internal changes Krzysztof Rymski 2025-12-19 02:45:52 -0800
  • 0ac55f71ed Avoid using Row() for unaligned storage. Balazs Racz 2025-12-18 05:10:21 -0800
  • 6661d3a60c Internal changes Krzysztof Rymski 2025-12-18 01:26:09 -0800
  • 142e6a7e9c No public description Liam Miller-Cushon 2025-12-17 20:10:24 -0800
  • b8a409dbba Use hn::Sub for vector subtraction in flash attention. Phil Culliton 2025-12-17 12:57:01 -0800
  • 596bdfe5af Separate monolithic gemma_lib library into more specific cc_library targets. Balazs Racz 2025-12-17 03:30:34 -0800
  • a4c78d4454
    Merge branch 'dev' into upgrade-github-actions-node24 Salman Chishti 2025-12-16 14:47:59 +0000
  • b66aa115ac
    Upgrade GitHub Actions for Node 24 compatibility Salman Muin Kayser Chishti 2025-12-16 14:26:24 +0000
  • baa69dfb78 Makes the entire runtime_config passed into the activations constructor. Balazs Racz 2025-12-16 01:56:18 -0800
  • 44dfd69b9b Internal changes Krzysztof Rymski 2025-12-15 07:14:04 -0800
  • 0c64987a96 Abort if args are unrecognized, refactor argument passing Jan Wassenberg 2025-12-15 03:18:11 -0800
  • f50550f4ce Warning fixes (sign mismatch), switch default Jan Wassenberg 2025-12-15 02:40:45 -0800
  • 506fb22be7 No public description Martin Stolle 2025-12-12 06:36:40 -0800