gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Biruk Mammo	5a05857deb	[Gemma.cpp] Allows non-owned arguments for attention methods. * Adds and uses a new `AttentionActivationPtrs` that holds non-owning `MatPtrs`. Acts as a view into `AttentionActivations`. * Updates `QBatch` to hold non-owning `MatPtr`s to the kv caches. * Enables the `MatPtrT` default constructor for simpler initializations. * Pulls out and passes `LayerWeightsPtrs::query_norm_scale` directly. While `LayerWeightsPtrs` already held non-owning `MatPtr`s, this change avoids the need to find and construct several empty weight tensors just to construct one `query_norm_scale` tensor. PiperOrigin-RevId: 824584177	2025-10-27 10:43:25 -07:00
Ray Smith	fb6fa793f4	Added a global (to gemma) zones list to enable most call sites to PROFILER_ZONE3 to avoid the sychronization required for the static const initialization of the zone handle. Improved flash_attention to enable profiling using the new zones. PiperOrigin-RevId: 819235421	2025-10-14 08:30:58 -07:00
Ray Smith	2f6cbde8ff	Added a smaller tile size to flash attention for smaller batch sizes PiperOrigin-RevId: 813226193	2025-09-30 05:49:20 -07:00
Jan Wassenberg	501fdf000e	Remove no longer used MatVec PiperOrigin-RevId: 809059409	2025-09-19 09:03:22 -07:00
Ray Smith	f10ac41a20	Added flash attention, with both a single-q function, and a register-tiled function. The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine. PiperOrigin-RevId: 804913784	2025-09-09 08:05:26 -07:00

5 Commits