gemma.cpp/util
Jan Wassenberg aaf51898b6 Major revamp #2 of Prefill: fix token order, parallel for multi-query
- Allocate only the required KV caches and activation batch size
- Add flags for batch sizes
- Const-correct interface: Span of const int.
- Also clean up the KVCache arg to a span.
- Move kPrefillBatchSize into RuntimeConfig and remove related global constants.

PiperOrigin-RevId: 655893197
2024-07-25 03:28:55 -07:00
..
app.h Major revamp #2 of Prefill: fix token order, parallel for multi-query 2024-07-25 03:28:55 -07:00
args.h Lint fix - string append, remove stale TODO 2024-07-08 04:11:21 -07:00