mirror of https://github.com/google/gemma.cpp.git
Parallelize over queries instead of tokens introduce non_eos so we only iterate over not yet EOS queries; remove TokenStreamer. move RMSNormInplaceBatched out of Transformer to call the latter from prefill Consistent arg order. Fix gemma_test EOS handling which (caught by msan), remove from tokenizer.h Also add output to gemma_batch_bench, fix name PiperOrigin-RevId: 769676106 |
||
|---|---|---|
| .. | ||
| benchmark.cc | ||
| benchmark_helper.cc | ||
| benchmark_helper.h | ||
| benchmarks.cc | ||
| cross_entropy.cc | ||
| cross_entropy.h | ||
| debug_prompt.cc | ||
| gemma_batch_bench.cc | ||
| gemma_test.cc | ||
| prompts.h | ||
| run_mmlu.cc | ||