gemma.cpp/ops
Krzysztof Rymski df162ead7c Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models.
It also supports better parallelism for small batch sizes / small models.
It also is able to utilize VDPBF16PS for nice 2x improvement on avx512

PiperOrigin-RevId: 874517319
2026-02-24 03:26:49 -08:00
..
bench_matmul.cc Internal change / remove unused PrintSpeed 2026-01-08 05:26:31 -08:00
dot-inl.h BF16 mixed-mode flash attention 2025-10-29 01:48:28 -07:00
dot_test.cc Add ToFloatSlow, move RandomFloat to test_util 2025-11-27 00:14:51 -08:00
fp_arith-inl.h Decouple MatMul from gemma-inl: precompile for all input types 2025-05-27 07:08:58 -07:00
matmul-inl.h 1.01x speedup: improved autotune 2025-10-27 05:35:31 -07:00
matmul.cc Fix excessive KC/MC from prior change 2025-10-28 05:33:01 -07:00
matmul.h Minor: ParallelismStrategy->Parallelism 2025-11-06 06:56:10 -08:00
matmul_static-inl.h 1.03x speedup: fused FFN 2025-09-15 10:26:37 -07:00
matmul_static.h Add 8-bit integer quantization (I8Stream) to Gemma.cpp. 2025-10-15 09:25:20 -07:00
matmul_static_bf16.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
matmul_static_f32.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
matmul_static_i8.cc Add 8-bit integer quantization (I8Stream) to Gemma.cpp. 2025-10-15 09:25:20 -07:00
matmul_static_nuq.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
matmul_static_sfp.cc Speed up builds by skipping rarely used targets 2025-06-17 05:44:20 -07:00
matmul_test.cc Internal change / remove unused PrintSpeed 2026-01-08 05:26:31 -08:00
ops-inl.h Implementation of tiled attention with bf16 and circular buffers which reduces memory requirements by 4x on longer context on gemma models. 2026-02-24 03:26:49 -08:00
ops.h Added access to softmax attention internals to regular attention 2025-11-21 09:01:01 -08:00
ops_test.cc Added access to softmax attention internals to regular attention 2025-11-21 09:01:01 -08:00
sum-inl.h Minor cleanup, Windows+Bazel build fixes 2024-10-10 09:05:06 -07:00