gemma.cpp

History

Jan Wassenberg ec02726cf7 6x large-batch, short-prompt prefill speedup Parallelize over queries instead of tokens introduce non_eos so we only iterate over not yet EOS queries; remove TokenStreamer. move RMSNormInplaceBatched out of Transformer to call the latter from prefill Consistent arg order. Fix gemma_test EOS handling which (caught by msan), remove from tokenizer.h Also add output to gemma_batch_bench, fix name PiperOrigin-RevId: 769676106		2025-06-10 09:56:20 -07:00
..
benchmark.cc	Replace RowVectorBatch with MatStorageT	2025-05-12 09:16:12 -07:00
benchmark_helper.cc	1.07x batch decode speedup: more BF16 weights and activations	2025-06-03 23:30:18 -07:00
benchmark_helper.h	Rename-only: remove Allocator2 etc suffixes now that refactoring is complete	2025-05-06 09:12:43 -07:00
benchmarks.cc	Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize.	2024-10-14 04:45:21 -07:00
cross_entropy.cc	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
cross_entropy.h	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
debug_prompt.cc	Move fields, io* and blob* from compression/ into io/	2025-05-06 11:17:19 -07:00
gemma_batch_bench.cc	6x large-batch, short-prompt prefill speedup	2025-06-10 09:56:20 -07:00
gemma_test.cc	6x large-batch, short-prompt prefill speedup	2025-06-10 09:56:20 -07:00
prompts.h	Benchmark gemma.cpp with different length inputs.	2024-10-10 15:59:26 -07:00
run_mmlu.cc	Move fields, io* and blob* from compression/ into io/	2025-05-06 11:17:19 -07:00