Further 1.02x prefill speedup from batch 64->512

Measured on SKX. Larger speedup expected for Zen4/SPR. PiperOrigin-RevId: 652472928
2024-07-15 07:25:25 -07:00 · 2024-07-15 07:25:25 -07:00 · cd530374b3
parent aaee666a1d
commit cd530374b3
1 changed files with 1 additions and 1 deletions
--- a/gemma/common.h
+++ b/gemma/common.h
@ -36,7 +36,7 @@ ByteStorageT AllocateSizeof() {
  return hwy::AllocateAligned<uint8_t>(sizeof(T));
 }

-constexpr size_t kPrefillBatchSize = 64;
+constexpr size_t kPrefillBatchSize = 512;
 constexpr size_t kDecodeBatchSize = 1;
 constexpr size_t kBatchedQueryBatchSize = 16;
 constexpr size_t kMinAdjustedPrefillBatchSize =