gemma.cpp/util
Ray Smith 7b55d41f46 Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
PiperOrigin-RevId: 868146247
2026-02-13 01:58:48 -08:00
..
allocator.cc Warning fix (unused var), Windows build fix (missing member variable) 2025-10-21 10:17:34 -07:00
allocator.h Warning fix (unused var), Windows build fix (missing member variable) 2025-10-21 10:17:34 -07:00
args.h Abort if args are unrecognized, refactor argument passing 2025-12-15 03:18:45 -08:00
basics.cc Replace mt19937 with new generator to enable parallel sampling 2025-09-04 23:49:10 -07:00
basics.h Change (old) attention behavior to disallow wraparound, enforced via assertion. 2025-11-04 11:52:40 -08:00
basics_test.cc Replace mt19937 with new generator to enable parallel sampling 2025-09-04 23:49:10 -07:00
mat.cc Add 8-bit integer quantization (I8Stream) to Gemma.cpp. 2025-10-15 09:25:20 -07:00
mat.h Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. 2026-02-13 01:58:48 -08:00
test_util.h Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. 2026-02-13 01:58:48 -08:00
threading.cc Major cleanup of profiler zones, add Caller annotation for all pool.Run 2025-10-23 01:54:24 -07:00
threading.h 1.01x speedup: improved autotune 2025-10-27 05:35:31 -07:00
threading_context.cc Add tensor stats and output 2025-12-11 22:52:46 -08:00
threading_context.h Abort if args are unrecognized, refactor argument passing 2025-12-15 03:18:45 -08:00
threading_test.cc Change (old) attention behavior to disallow wraparound, enforced via assertion. 2025-11-04 11:52:40 -08:00
topology.cc Avoid warning when OS affinity limits us to the second socket 2025-12-08 07:10:43 -08:00
topology.h Avoid warning when OS affinity limits us to the second socket 2025-12-08 07:10:43 -08:00
zones.cc Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. 2026-02-13 01:58:48 -08:00
zones.h Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. 2026-02-13 01:58:48 -08:00