Small updates to the README file.

PiperOrigin-RevId: 707036429
This commit is contained in:
Daniel Keysers 2024-12-17 04:09:17 -08:00 committed by Copybara-Service
parent 62c70d6715
commit 73766e8ee3
1 changed files with 12 additions and 16 deletions

View File

@ -4,14 +4,9 @@ gemma.cpp is a lightweight, standalone C++ inference engine for the Gemma
foundation models from Google. foundation models from Google.
For additional information about Gemma, see For additional information about Gemma, see
[ai.google.dev/gemma](https://ai.google.dev/gemma). Model weights, including gemma.cpp [ai.google.dev/gemma](https://ai.google.dev/gemma). Model weights, including
specific artifacts, are [available on gemma.cpp specific artifacts, are
kaggle](https://www.kaggle.com/models/google/gemma). [available on kaggle](https://www.kaggle.com/models/google/gemma).
NOTE: 2024-04-04: if using 2B models, please re-download weights from Kaggle and
ensure you have the latest version (-mqa or version 3). We are changing the code
to match the new weights. If you wish to use old weights, change `ConfigGemma2B`
in `configs.h` back to `kVocabSize = 256128` and `kKVHeads = 8`.
## Who is this project for? ## Who is this project for?
@ -23,10 +18,10 @@ deployment-oriented C++ inference runtimes, which are not designed for
experimentation, and Python-centric ML research frameworks, which abstract away experimentation, and Python-centric ML research frameworks, which abstract away
low-level computation through compilation. low-level computation through compilation.
gemma.cpp provides a minimalist implementation of Gemma-1 and Gemma-2 models, gemma.cpp provides a minimalist implementation of Gemma-1, Gemma-2, and
focusing on simplicity and directness rather than full generality. This is PaliGemma models, focusing on simplicity and directness rather than full
inspired by vertically-integrated model implementations such as generality. This is inspired by vertically-integrated model implementations such
[ggml](https://github.com/ggerganov/ggml), as [ggml](https://github.com/ggerganov/ggml),
[llama.c](https://github.com/karpathy/llama2.c), and [llama.c](https://github.com/karpathy/llama2.c), and
[llama.rs](https://github.com/srush/llama2.rs). [llama.rs](https://github.com/srush/llama2.rs).
@ -226,7 +221,7 @@ Argument | Description | Example value
./gemma \ ./gemma \
--tokenizer [tokenizer file] \ --tokenizer [tokenizer file] \
--weights [compressed weights file] \ --weights [compressed weights file] \
--weight_type [f32 or bf16 or sfp] \ --weight_type [f32 or bf16 or sfp (default:sfp)] \
--model [2b-it or 2b-pt or 7b-it or 7b-pt or ...] --model [2b-it or 2b-pt or 7b-it or 7b-pt or ...]
``` ```
@ -239,7 +234,7 @@ Example invocation for the following configuration:
```sh ```sh
./gemma \ ./gemma \
--tokenizer tokenizer.spm \ --tokenizer tokenizer.spm \
--weights 2b-it-sfp.sbs --weight_type sfp --model 2b-it --weights 2b-it-sfp.sbs --model 2b-it
``` ```
### RecurrentGemma ### RecurrentGemma
@ -263,8 +258,9 @@ Step 1, and run the binary as follows:
This repository includes a version of the PaliGemma VLM This repository includes a version of the PaliGemma VLM
([paper](https://arxiv.org/abs/2407.07726), ([paper](https://arxiv.org/abs/2407.07726),
[code](https://github.com/google-research/big_vision/tree/main/big_vision/configs/proj/paligemma)). [code](https://github.com/google-research/big_vision/tree/main/big_vision/configs/proj/paligemma))
We provide a C++ implementation of this model here. and its successor PaliGemma 2 ([paper](https://arxiv.org/abs/2412.03555)). We
provide a C++ implementation of the PaliGemma model family here.
To use the version of PaliGemma included in this repository, build the gemma To use the version of PaliGemma included in this repository, build the gemma
binary as noted above in Step 3. Download the compressed weights and tokenizer binary as noted above in Step 3. Download the compressed weights and tokenizer