gemma.cpp/examples/simplified_gemma
Jan Wassenberg 0c64987a96 Abort if args are unrecognized, refactor argument passing
This catches typos/incorrect usage.
Refactor: group Loader/Threading/Inference into GemmaArgs.
All *Args ctors now have an extra ConsumedArgs& argument.
PiperOrigin-RevId: 844690553
2025-12-15 03:18:45 -08:00
..
build Simplified interface class and example for Gemma.cpp usage. 2025-01-28 08:48:27 -08:00
BUILD.bazel Matmul refactoring towards fusion 2025-09-09 07:13:38 -07:00
CMakeLists.txt Adding a simple test for GemmaAttention 2025-12-09 02:13:03 -08:00
README.md Major refactor of allocator/args: 2025-04-10 01:29:54 -07:00
gemma.hpp Abort if args are unrecognized, refactor argument passing 2025-12-15 03:18:45 -08:00
run.cc Abort if args are unrecognized, refactor argument passing 2025-12-15 03:18:45 -08:00

README.md

Simplified Gemma.cpp Example

This is a minimal/template project for using gemma.cpp as a library. Instead of an interactive interface, it sets up the model state and generates text for a single hard coded prompt.

Build steps are similar to the main gemma executable. For now only cmake/make is available for builds (PRs welcome for other build options).

First use cmake to configure the project, starting from the simplified_gemma example directory (gemma.cpp/examples/simplified_gemma):

cmake -B build

This sets up a build configuration in gemma.cpp/examples/simplified_gemma/build. Note that this fetches libgemma from a git commit hash on github. Alternatively if you want to build using the local version of gemma.cpp use:

cmake -B build -DBUILD_MODE=local

Make sure you delete the contents of the build directory before changing configurations.

Then use make to build the project:

cd build
make simplified_gemma

As with the top-level gemma.cpp project you can use the make commands -j flag to use parallel threads for faster builds.

From inside the gemma.cpp/examples/simplified_gemma/build directory, there should be a simplified_gemma executable. You can run it with the same 3 model arguments as gemma.cpp specifying the tokenizer, compressed weights file, and model type, for example:

./simplified_gemma --tokenizer tokenizer.spm --weights 2b-it-sfp.sbs --model 2b-it

Should print a greeting to the terminal:

"Hello, world! It's a pleasure to greet you all. May your day be filled with joy, peace, and all the things that make your heart soar.

For a demonstration of constrained decoding, add the --reject flag followed by a list of token IDs (note that it must be the last flag, since it consumes every subsequent argument). For example, to reject variations of the word "greeting", run:

./simplified_gemma [...] --reject 32338 42360 78107 106837 132832 143859 154230 190205