gemma.cpp

History

Biruk Mammo 5a05857deb [Gemma.cpp] Allows non-owned arguments for attention methods. * Adds and uses a new `AttentionActivationPtrs` that holds non-owning `MatPtrs`. Acts as a view into `AttentionActivations`. * Updates `QBatch` to hold non-owning `MatPtr`s to the kv caches. * Enables the `MatPtrT` default constructor for simpler initializations. * Pulls out and passes `LayerWeightsPtrs::query_norm_scale` directly. While `LayerWeightsPtrs` already held non-owning `MatPtr`s, this change avoids the need to find and construct several empty weight tensors just to construct one `query_norm_scale` tensor. PiperOrigin-RevId: 824584177		2025-10-27 10:43:25 -07:00
..
bench_matmul.cc	Major cleanup of profiler zones, add Caller annotation for all pool.Run	2025-10-23 01:54:24 -07:00
dot-inl.h	f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX	2025-08-28 08:55:50 -07:00
dot_test.cc	Major cleanup of profiler zones, add Caller annotation for all pool.Run	2025-10-23 01:54:24 -07:00
fp_arith-inl.h	Decouple MatMul from gemma-inl: precompile for all input types	2025-05-27 07:08:58 -07:00
matmul-inl.h	1.01x speedup: improved autotune	2025-10-27 05:35:31 -07:00
matmul.cc	1.01x speedup: improved autotune	2025-10-27 05:35:31 -07:00
matmul.h	1.01x speedup: improved autotune	2025-10-27 05:35:31 -07:00
matmul_static-inl.h	1.03x speedup: fused FFN	2025-09-15 10:26:37 -07:00
matmul_static.h	Add 8-bit integer quantization (I8Stream) to Gemma.cpp.	2025-10-15 09:25:20 -07:00
matmul_static_bf16.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_static_f32.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_static_i8.cc	Add 8-bit integer quantization (I8Stream) to Gemma.cpp.	2025-10-15 09:25:20 -07:00
matmul_static_nuq.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_static_sfp.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
matmul_test.cc	1.02x speedup: improve load balance and simplify parallelFor	2025-10-24 00:19:09 -07:00
ops-inl.h	Major cleanup of profiler zones, add Caller annotation for all pool.Run	2025-10-23 01:54:24 -07:00
ops.h	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00
ops_test.cc	[Gemma.cpp] Allows non-owned arguments for attention methods.	2025-10-27 10:43:25 -07:00
sum-inl.h	Minor cleanup, Windows+Bazel build fixes	2024-10-10 09:05:06 -07:00