gemma.cpp/evals
Ray Smith 7b55d41f46 Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention.
PiperOrigin-RevId: 868146247
2026-02-13 01:58:48 -08:00
..
testdata Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. 2026-02-13 01:58:48 -08:00
benchmark.cc Abort if args are unrecognized, refactor argument passing 2025-12-15 03:18:45 -08:00
benchmark_helper.cc Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. 2026-02-13 01:58:48 -08:00
benchmark_helper.h Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. 2026-02-13 01:58:48 -08:00
benchmarks.cc Abort if args are unrecognized, refactor argument passing 2025-12-15 03:18:45 -08:00
cross_entropy.cc Major cleanup of profiler zones, add Caller annotation for all pool.Run 2025-10-23 01:54:24 -07:00
cross_entropy.h Move MatMulEnv out of Gemma to enable concurrent calls 2025-06-23 01:20:09 -07:00
debug_prompt.cc Abort if args are unrecognized, refactor argument passing 2025-12-15 03:18:45 -08:00
gemma_batch_bench.cc Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. 2026-02-13 01:58:48 -08:00
gemma_test.cc Abort if args are unrecognized, refactor argument passing 2025-12-15 03:18:45 -08:00
prompts.h Benchmark gemma.cpp with different length inputs. 2024-10-10 15:59:26 -07:00
run_mmlu.cc Abort if args are unrecognized, refactor argument passing 2025-12-15 03:18:45 -08:00
wheat_from_chaff_test.cc Rewrote flash attention to use BF16, transpose k and v, rewrote the task distribution, increase parallelism on decode, and use double the registers for the core of flash attention. 2026-02-13 01:58:48 -08:00