Add note on attention length and SFP

PiperOrigin-RevId: 738698399
This commit is contained in:
Jan Wassenberg 2025-03-20 00:38:33 -07:00 committed by Copybara-Service
parent 3d419ec173
commit 83219e3c68
2 changed files with 18 additions and 10 deletions

View File

@ -347,6 +347,12 @@ instruction-tuned and thus does not respond to instructions. Make sure you are
using an instruction-tuned model (`2b-it-sfp`, `2b-it`, `7b-it-sfp`, `7b-it`)
and not a pre-trained model (any model with a `-pt` suffix).
**What sequence lengths are supported?**
See `seq_len` in `configs.cc`. For the Gemma 3 models larger than 1B, this is
typically 32K but 128K would also work given enough RAM. Note that long
sequences will be slow due to the quadratic cost of attention.
**How do I convert my fine-tune to a `.sbs` compressed model file?**
For PaliGemma (1 and 2) checkpoints, you can use
@ -373,6 +379,8 @@ pytorch checkpoint. (The code may need updates to work with Gemma-2 models.)
**What are some easy ways to make the model run faster?**
1. Make sure you are using the 8-bit switched floating point `-sfp` models.
These are half the size of bf16 and thus use less memory bandwidth and cache
space.
2. If you're on a laptop, make sure power mode is set to maximize performance
and saving mode is **off**. For most laptops, the power saving modes get
activated automatically if the computer is not plugged in.

View File

@ -80,7 +80,7 @@ constexpr PromptWrapping kPromptWrapping[] = {
PromptWrapping::PALIGEMMA, PromptWrapping::PALIGEMMA, // PG2 3B 224/448
PromptWrapping::PALIGEMMA, PromptWrapping::PALIGEMMA, // PG2 10B 224/448
PromptWrapping::GEMMA_VLM, // Gemma3 4B
PromptWrapping::GEMMA_IT, // Gemma3 1B
PromptWrapping::GEMMA_PT, // Gemma3 1B
PromptWrapping::GEMMA_VLM, // Gemma3 12B
PromptWrapping::GEMMA_VLM, // Gemma3 27B
};