mirror of https://github.com/google/gemma.cpp.git
Merge pull request #102 from google:experimental
PiperOrigin-RevId: 616882521
This commit is contained in:
commit
720f609d84
|
|
@ -118,8 +118,7 @@ jax / pytorch / keras for NN deployments.
|
||||||
|
|
||||||
### Gemma struct contains all the state of the inference engine - tokenizer, weights, and activations
|
### Gemma struct contains all the state of the inference engine - tokenizer, weights, and activations
|
||||||
|
|
||||||
`Gemma(...)` - constructor, creates a gemma model object, which is a wrapper
|
`Gemma(...)` - constructor, creates a gemma model object.
|
||||||
around 3 things - the tokenizer object, weights, activations, and KV Cache.
|
|
||||||
|
|
||||||
In a standard LLM chat app, you'll probably use a Gemma object directly, in
|
In a standard LLM chat app, you'll probably use a Gemma object directly, in
|
||||||
more exotic data processing or research applications, you might decompose
|
more exotic data processing or research applications, you might decompose
|
||||||
|
|
@ -129,11 +128,13 @@ only using a Gemma object.
|
||||||
|
|
||||||
### Use the tokenizer in the Gemma object (or interact with the Tokenizer object directly)
|
### Use the tokenizer in the Gemma object (or interact with the Tokenizer object directly)
|
||||||
|
|
||||||
You pretty much only do things with the tokenizer, call `Encode()` to go from
|
The Gemma object contains contains a pointer to a Tokenizer object. The main
|
||||||
string prompts to token id vectors, or `Decode()` to go from token id vector
|
operations performed on the tokenizer are to load the tokenizer model from a
|
||||||
outputs from the model back to strings.
|
file (usually `tokenizer.spm`), call `Encode()` to go from string prompts to
|
||||||
|
token id vectors, or `Decode()` to go from token id vector outputs from the
|
||||||
|
model back to strings.
|
||||||
|
|
||||||
### The main entrypoint for generation is `GenerateGemma()`
|
### `GenerateGemma()` is the entrypoint for token generation
|
||||||
|
|
||||||
Calling into `GenerateGemma` with a tokenized prompt will 1) mutate the
|
Calling into `GenerateGemma` with a tokenized prompt will 1) mutate the
|
||||||
activation values in `model` and 2) invoke StreamFunc - a lambda callback for
|
activation values in `model` and 2) invoke StreamFunc - a lambda callback for
|
||||||
|
|
@ -150,7 +151,7 @@ constrained decoding type of use cases where you want to force the generation
|
||||||
to fit a grammar. If you're not doing this, you can send an empty lambda as a
|
to fit a grammar. If you're not doing this, you can send an empty lambda as a
|
||||||
no-op which is what `run.cc` does.
|
no-op which is what `run.cc` does.
|
||||||
|
|
||||||
### If you want to invoke the neural network forward function directly call the `Transformer()` function
|
### `Transformer()` implements the inference (i.e. `forward()` method in PyTorch or Jax) computation of the neural network
|
||||||
|
|
||||||
For high-level applications, you might only call `GenerateGemma()` and never
|
For high-level applications, you might only call `GenerateGemma()` and never
|
||||||
interact directly with the neural network, but if you're doing something a bit
|
interact directly with the neural network, but if you're doing something a bit
|
||||||
|
|
|
||||||
|
|
@ -36,7 +36,12 @@ For production-oriented edge deployments we recommend standard deployment
|
||||||
pathways using Python frameworks like JAX, Keras, PyTorch, and Transformers
|
pathways using Python frameworks like JAX, Keras, PyTorch, and Transformers
|
||||||
([all model variations here](https://www.kaggle.com/models/google/gemma)).
|
([all model variations here](https://www.kaggle.com/models/google/gemma)).
|
||||||
|
|
||||||
Community contributions large and small are welcome. This project follows
|
## Contributing
|
||||||
|
|
||||||
|
Community contributions large and small are welcome. See
|
||||||
|
[DEVELOPERS.md](https://github.com/google/gemma.cpp/blob/main/DEVELOPERS.md)
|
||||||
|
for additional notes contributing developers and [join the discord by following
|
||||||
|
this invite link](https://discord.gg/H5jCBAWxAe). This project follows
|
||||||
[Google's Open Source Community
|
[Google's Open Source Community
|
||||||
Guidelines](https://opensource.google.com/conduct/).
|
Guidelines](https://opensource.google.com/conduct/).
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,7 @@
|
||||||
|
# Examples
|
||||||
|
|
||||||
|
In this directory are some simple examples illustrating usage of `gemma.cpp` as
|
||||||
|
a library beyond the interactive `gemma` app implemented in `run.cc`.
|
||||||
|
|
||||||
|
- `hello_world/` - minimal/template project for using `gemma.cpp` as a library.
|
||||||
|
It sets up the model state and generates text for a single hard coded prompt.
|
||||||
|
|
@ -0,0 +1,3 @@
|
||||||
|
# Experimental
|
||||||
|
|
||||||
|
This directory is for experimental code and features.
|
||||||
Loading…
Reference in New Issue