Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively. Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly. Adjusted WeightsWrapper and ForwardLayer etc to match. The only remaining template arg is the weight type. This enables all the instantiations to be deleted, apart from one per type. It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately. Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively. PiperOrigin-RevId: 686870060 |
||
|---|---|---|
| .. | ||
| build | ||
| BUILD.bazel | ||
| CMakeLists.txt | ||
| README.md | ||
| run.cc | ||
README.md
Hello World Example
This is a minimal/template project for using gemma.cpp as a library. Instead
of an interactive interface, it sets up the model state and generates text for a
single hard coded prompt.
Build steps are similar to the main gemma executable. For now only
cmake/make is available for builds (PRs welcome for other build options).
First use cmake to configure the project, starting from the hello_world
example directory (gemma.cpp/examples/hello_world):
cmake -B build
This sets up a build configuration in gemma.cpp/examples/hello_world/build.
Note that this fetches libgemma from a git commit hash on github.
Alternatively if you want to build using the local version of gemma.cpp use:
cmake -B build -DBUILD_MODE=local
Make sure you delete the contents of the build directory before changing configurations.
Then use make to build the project:
cd build
make hello_world
As with the top-level gemma.cpp project you can use the make commands -j
flag to use parallel threads for faster builds.
From inside the gemma.cpp/examples/hello_world/build directory, there should
be a hello_world executable. You can run it with the same 3 model arguments as
gemma.cpp specifying the tokenizer, compressed weights file, and model type, for
example:
./hello_world --tokenizer tokenizer.spm --compressed_weights 2b-it-sfp.sbs --model 2b-it
Should print a greeting to the terminal:
"Hello, world! It's a pleasure to greet you all. May your day be filled with joy, peace, and all the things that make your heart soar.
For a demonstration of constrained decoding, add the --reject flag followed by
a list of token IDs (note that it must be the last flag, since it consumes every
subsequent argument). For example, to reject variations of the word "greeting",
run:
./hello_world [...] --reject 32338 42360 78107 106837 132832 143859 154230 190205