Commit Graph

15 Commits

Author SHA1 Message Date
Krzysztof Rymski 0bce91c4f4 Change to use faster exponent function
PiperOrigin-RevId: 875649021
2026-02-26 07:04:02 -08:00
Krzysztof Rymski df162ead7c Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.
It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512

PiperOrigin-RevId: 874517319
2026-02-24 03:26:49 -08:00
Krzysztof Rymski 7dc98902d3 Internal changes
PiperOrigin-RevId: 872280443
2026-02-19 01:57:58 -08:00
The gemma.cpp Authors 34739fd9f0 Internal changes
PiperOrigin-RevId: 871792281
2026-02-18 04:07:36 -08:00
Krzysztof Rymski c6696342fa Internal changes
PiperOrigin-RevId: 871776998
2026-02-18 03:21:41 -08:00
Ray Smith 76d7951242 Added wheat_from_chaff_test to test the ability of a model to find a needle in a haystack of data.
Replaced flag with attention_impl to control which attention to run.

PiperOrigin-RevId: 869694868
2026-02-13 06:05:30 -08:00
Balazs Racz baa69dfb78 Makes the entire runtime_config passed into the activations constructor.
PiperOrigin-RevId: 845153671
2025-12-16 01:56:52 -08:00
Krzysztof Rymski 44dfd69b9b Internal changes
PiperOrigin-RevId: 844759322
2025-12-15 07:14:37 -08:00
Martin Stolle 5a500872b8 Internal change
PiperOrigin-RevId: 835115693
2025-11-21 01:17:45 -08:00
Martin Stolle 35e9f9f05f Introduce attention implementation configurability.
PiperOrigin-RevId: 828971705
2025-11-06 08:43:41 -08:00
Biruk Mammo 5a05857deb [Gemma.cpp] Allows non-owned arguments for attention methods.
* Adds and uses a new `AttentionActivationPtrs` that holds non-owning `MatPtrs`. Acts as a view into `AttentionActivations`.
* Updates `QBatch` to hold  non-owning `MatPtr`s to the kv caches.
* Enables the `MatPtrT` default constructor for simpler initializations.
* Pulls out and passes `LayerWeightsPtrs::query_norm_scale` directly. While `LayerWeightsPtrs` already held non-owning `MatPtr`s, this change avoids the need to find and construct several empty weight tensors just to construct one `query_norm_scale` tensor.

PiperOrigin-RevId: 824584177
2025-10-27 10:43:25 -07:00
Ray Smith fb6fa793f4 Added a global (to gemma) zones list to enable most call sites to PROFILER_ZONE3 to avoid the sychronization required for the static const initialization of the zone handle.
Improved flash_attention to enable profiling using the new zones.

PiperOrigin-RevId: 819235421
2025-10-14 08:30:58 -07:00
Ray Smith 2f6cbde8ff Added a smaller tile size to flash attention for smaller batch sizes
PiperOrigin-RevId: 813226193
2025-09-30 05:49:20 -07:00
Jan Wassenberg 501fdf000e Remove no longer used MatVec
PiperOrigin-RevId: 809059409
2025-09-19 09:03:22 -07:00
Ray Smith f10ac41a20 Added flash attention, with both a single-q function, and a register-tiled function.
The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine.

PiperOrigin-RevId: 804913784
2025-09-09 08:05:26 -07:00