gemma.cpp

History

Jan Wassenberg ba6131311a Fix gemma_batch_bench for flash attention q_T rows do not change. Also repeat prefill to reflect perf after autotuning. PiperOrigin-RevId: 805319377		2025-09-10 05:32:34 -07:00
..
benchmark.cc	Rename GetModelConfig->Config	2025-07-29 10:18:12 -07:00
benchmark_helper.cc	Cleanup: split CacheInfo from Allocator, MatMul helper functions	2025-09-08 02:23:58 -07:00
benchmark_helper.h	Replace mt19937 with new generator to enable parallel sampling	2025-09-04 23:49:10 -07:00
benchmarks.cc	Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize.	2024-10-14 04:45:21 -07:00
cross_entropy.cc	1.15x speedup: parallel sampling, enabled by new RNG	2025-09-05 07:24:02 -07:00
cross_entropy.h	Move MatMulEnv out of Gemma to enable concurrent calls	2025-06-23 01:20:09 -07:00
debug_prompt.cc	Move fields, io* and blob* from compression/ into io/	2025-05-06 11:17:19 -07:00
gemma_batch_bench.cc	Fix gemma_batch_bench for flash attention	2025-09-10 05:32:34 -07:00
gemma_test.cc	Remove Griffin support	2025-09-05 02:35:40 -07:00
prompts.h	Benchmark gemma.cpp with different length inputs.	2024-10-10 15:59:26 -07:00
run_mmlu.cc	Replace mt19937 with new generator to enable parallel sampling	2025-09-04 23:49:10 -07:00