Commit Graph

821 Commits

Author SHA1 Message Date
Jan Wassenberg a0ff98ea60 Entirely remove constexpr on PaddedDirEnd. Refs #551
Apparently GCC 9.4 does not handle HWY_CXX17_CONSTEXPR as we intend.

PiperOrigin-RevId: 755967709
2025-05-07 12:48:19 -07:00
Biruk Mammo d9d1709df8 Updates stale references to `compression/migrate_weights`.
PiperOrigin-RevId: 755938143
2025-05-07 11:33:59 -07:00
The gemma.cpp Authors 20757046db cleanup, new conversation methods, bugfixes
- chore: unused parameters cleaned up
- bugfix: explicitly use hwy::Span in GenerateInternal() to prevent runtime crashes due to memory layout incompatibility
- bugfix: explicit nullptr check in LogDebug
- chore: length-related parameters renamed for clarity
- feature: SaveConversation() can be optionally used to save copy of a conversation that ResetConversation() will rewind to upon request, rather than just an empty KV cache
- feature: GetCurrentConversation() can be used to query the current conversation's name

PiperOrigin-RevId: 755873147
2025-05-07 08:52:44 -07:00
Jan Wassenberg e9ecb7794d Fix gcc build error and gemma3 crash, thanks @ufownl, fixes #551
PiperOrigin-RevId: 755729478
2025-05-07 00:59:18 -07:00
Jan Wassenberg c8d92948f4 Move fields, io* and blob* from compression/ into io/
PiperOrigin-RevId: 755445712
2025-05-06 11:17:19 -07:00
Jan Wassenberg 275135d7e8 Rename-only: remove Allocator2 etc suffixes now that refactoring is complete
PiperOrigin-RevId: 755397220
2025-05-06 09:12:43 -07:00
Jan Wassenberg 8d0882b966 Huge refactor of weight handling and model loading.
Weight handling:
- new ModelStore2 supports both pre-2025 multi-file and single-file formats
- simpler ForEachTensor with TensorArgs
- tensors are constructed with their full suffixed name

I/O:
- support mmap and stride
- Simplified SbsWriter, single insert(); add SbsReader

Misc:
- kMockTokenizer: allow creating with unavailable tokenizer
- configs.h: Simpler enum validity checks via kSentinel
- matmul.h: remove unused enable_bind (now in allocator.h)
- tensor_info: single TensorInfoRegistry class, rename from tensor_index.h

Frontends:
- Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&)
- Deduce model/weight type, remove --model and parsing
- Replace most common.h includes with configs.h
- Remove --compressed_weights, use --weights instead
- Remove ModelInfo, replaced by ModelConfig.

Backprop:
- Reduce max loss, remove backward_scalar_test (timeout)
- Update thresholds because new RandInit changes rng eval order and thus numerics
PiperOrigin-RevId: 755317484
2025-05-06 04:44:21 -07:00
Jan Wassenberg a3caf6e5d2 Add summary of optimizations/infra present in the repository
PiperOrigin-RevId: 754838402
2025-05-05 01:46:01 -07:00
Jan Wassenberg fe80f10ed7 Backprop test fixes and allocator cleanup
- Shorten backprop tests to prevent timeout
- Add line number of failing test
- matmul: remove unused enable_bind
- allocator: we will retain enable_bind there
- mat: disable cyclic padding optimization (broken)

PiperOrigin-RevId: 752656068
2025-04-29 03:01:10 -07:00
Jan Wassenberg 160a5824fb Cleanup: include fixes/comments, fix leak, vector reserve
Also remove unused RowSpan
configs.cc: Assign prompt wrapping to ModelConfig
configs.h: simplify EnumValid via sentinel

PiperOrigin-RevId: 750278497
2025-04-22 12:01:46 -07:00
The gemma.cpp Authors ba10c88a94 Add C API and C# interop files
This change adds a basic C API that allows access to Gemma functionality from other programming languages. The functionality is exposed via a shared library (DLL on Windows), with C++ interfaces and a basic C# interop wrapper included.

To build the DLL, use the `windows-dll` preset, which includes the C and C++ sources as follows:
```
cmake --preset windows-dll
cmake --build --config Release --preset windows-dll -j 4
```
This should generate a `gemma.dll` in `<build-dir>/Release`.

To build for non-Windows, the appropriate C++ DLL linking will need to be done to generate a shared library for the target OS.

PiperOrigin-RevId: 750246272
2025-04-22 10:35:47 -07:00
Copybara-Service f20da328de Merge pull request #539 from prajwalc22:feature-prompt-flag
PiperOrigin-RevId: 750118715
2025-04-22 03:09:19 -07:00
prajwalc22 2407150f84 Merge branch 'feature-prompt-flag' of github.com:prajwalc22/gemma.cpp into feature-prompt-flag 2025-04-17 23:54:46 +05:30
prajwalc22 a9e56c27eb removed unnecessary threading.h import 2025-04-17 23:44:23 +05:30
Prajwal Choudhari 09dfb144c0
Merge branch 'dev' into feature-prompt-flag 2025-04-17 18:53:28 +05:30
prajwalc22 f55c321397 Address review feedback: Fix prefill_tbatch_size and variable placement issues 2025-04-17 10:15:21 +05:30
prajwalc22 27c28cc938 Address review feedback: Fix prefill_tbatch_size and variable placement issues 2025-04-17 10:15:05 +05:30
Jan Wassenberg 87a658b1c6 Minor cleanup, on-demand NUQ buffer allocation
threading_context: add profiler
compress-inl: add constexpr, on-demand alloc NUQ buffer
gemma_py: model->gemma
Move ScaleWeights to compress.cc
Move PromptWrapping to configs.h
PiperOrigin-RevId: 748347896
2025-04-16 10:49:43 -07:00
prajwalc22 8246e49199 Add non-interactive mode support
- Added prompt flag to InferenceArgs for non-interactive mode
- Set user-facing options to verbosity level 1
- Fixed prompt_size declaration and variable ordering in run.cc
- Properly set prompt_size after WrapAndTokenize calls
- Moved kVerboseLogTokens block after prompt_size is set
2025-04-16 16:26:52 +05:30
prajwalc22 cbf179990f Add --prompt flag for non-interactive mode 2025-04-16 15:34:43 +05:30
prajwalc22 716713f0e6 Update .gitignore to exclude build directory and model files 2025-04-16 09:52:30 +05:30
prajwalc22 01caf379ba Update .gitignore to exclude build directory and model files 2025-04-16 09:45:14 +05:30
prajwalc22 87a1c76578 Update CMake configuration and documentation for --prompt flag 2025-04-16 09:45:14 +05:30
prajwalc22 f3116d2577 Add --prompt flag for non-interactive mode
This change adds a --prompt command-line option that allows users to
provide prompts directly without entering interactive mode, which is
useful for scripting and automation.
2025-04-16 09:45:02 +05:30
The gemma.cpp Authors 7164a5e844 Internal change.
PiperOrigin-RevId: 746953110
2025-04-12 20:27:49 -07:00
Jan Wassenberg 2e722f14f1 Add mmap support (not yet used)
Also: const-correct ArgsBase,
add assert to mat.h checking element_bytes_
BUILD deps update (:shared provides shared.h, not :sfp)
PiperOrigin-RevId: 746073312
2025-04-10 10:03:40 -07:00
Jan Wassenberg 8532da47f7 Major refactor of allocator/args:
use new ThreadingContext2 instead of monostate/init in each frontend
Add ThreadingArgs(replaces AppArgs)

backprop: use Packed() accessor and MakePacked factory and row-based access to allow for stride
compress_weights: remove, moving to py-only exporter instead

Move MatPtr to mat.h and revise interface:
- Generic MatOwner
- rename accessors to Packed*
- support stride/row accessors, fix RowPtr stride

Add TypeBits(Type)
Move GenerateMat to test_util-inl for sharing between matmul test/bench
Move internal init to gemma.cc to avoid duplication
Rename GemmaEnv model_ to gemma_ for disambiguating vs upcoming ModelStorage
Remove --compressed_weights, use --weights instead.
tensor_index: add ExtentsFromInfo and TensorIndexLLM/Img
Allocator: use normal unique_ptr for AllocBytes so users can call directly
threading: use -> because AlignedPtr no longer assumes arrays
PiperOrigin-RevId: 745918637
2025-04-10 01:29:54 -07:00
Copybara-Service bef91a3f03 Merge pull request #529 from ufownl:refactor/wrap_and_tokenize
PiperOrigin-RevId: 745174371
2025-04-08 09:22:26 -07:00
Jan Wassenberg 5d4f7e0f7e Add new singleton Allocator2 instead of monostate
Not yet used.

Also fix format-string warning in topology.cc.

PiperOrigin-RevId: 745166210
2025-04-08 09:00:59 -07:00
Jan Wassenberg 4e6aa36e9b Minor cleanup: enable 0,0 Extents2D, add SerializedSpan typedef, include fixes
PiperOrigin-RevId: 745068776
2025-04-08 03:35:55 -07:00
RangerUFO cc2e14e654 Improve `GemmaChatTemplate` to handle vision prompt wrapping 2025-03-29 11:31:40 +08:00
RangerUFO c39295f497 Inline the ctor of `GemmaChatTemplate` 2025-03-29 11:31:40 +08:00
RangerUFO d1615b56b2 Fix the prompt wrapping of gemma3-1b again
It seems that the previous fix was changed back due to a merge error.
2025-03-29 11:31:39 +08:00
RangerUFO ca4ee2b63f Refactor `WrapAndTokenize` to work properly with Gemma3 2025-03-29 11:31:39 +08:00
Jan Wassenberg 76a81ac2d6 Fix unaligned buffer causing crash on GCC. Thanks @ufownl, fixes #508
PiperOrigin-RevId: 741590339
2025-03-28 11:25:33 -07:00
Jan Wassenberg e55734219d Fix test threshold and improve warning output
PiperOrigin-RevId: 740738937
2025-03-26 06:11:27 -07:00
Copybara-Service 4a924f1794 Merge pull request #527 from ufownl:feature/gemma2_secondary_eos
PiperOrigin-RevId: 740327973
2025-03-25 06:44:41 -07:00
RangerUFO d42deaa27c Set the secondary EOS for Gemma2
So that we can remove the `<end_of_turn>` filter that was set up
specifically for Gemma2.
2025-03-22 01:32:22 +08:00
RangerUFO 2bad79f110 Fix the EOS checking
The secondary eos is usually `<end_of_turn>`, which can appear in the
prompt, so we can only check it not in the prompt.
2025-03-22 01:32:22 +08:00
Jan Wassenberg 6300c123ee Update app argument documentation
PiperOrigin-RevId: 739159864
2025-03-21 06:33:30 -07:00
Phil Culliton 05b1cce9f7 Add support for a secondary EOS token
PiperOrigin-RevId: 738898976
2025-03-20 12:28:31 -07:00
Jan Wassenberg 83219e3c68 Add note on attention length and SFP
PiperOrigin-RevId: 738698399
2025-03-20 00:39:06 -07:00
pculliton 3d419ec173
Merge pull request #523 from ufownl/bugfix/gemma3_1b_wrapping
Fix the prompt wrapping of gemma3-1b
2025-03-19 10:30:27 -04:00
RangerUFO b16ce9a0b4 Fix the prompt wrapping of gemma3-1b 2025-03-18 16:52:38 +08:00
Jan Wassenberg 1b72c22345 Refactor Gemma ctor and improve pool NUMA support
Gemma receives a MatMulEnv arg, with comment on lifetime
Split threading into topology so the latter can be used in allocator
Add AllocClasses() for non-POD (ThreadPool)
Support binding pool to NUMA node
Update threading_test with latency measurements
Also update Highway version.

PiperOrigin-RevId: 736904748
2025-03-14 10:19:00 -07:00
Phil Culliton 1b1b63d560 Fix PaliGemma models.
PiperOrigin-RevId: 736483021
2025-03-13 06:28:29 -07:00
Quirin Niedernhuber 0ff6b3123a Point out Gemma 3 support in README.md
PiperOrigin-RevId: 736125794
2025-03-12 07:33:30 -07:00
Jan Wassenberg 5898fa5eb0 Update github actions/cache version
PiperOrigin-RevId: 736120661
2025-03-12 07:12:55 -07:00
Phil Culliton 4ab601da10 Internal change.
PiperOrigin-RevId: 736015810
2025-03-11 23:20:20 -07:00
Phil Culliton 9d83ff202e Internal change.
PiperOrigin-RevId: 736014152
2025-03-11 23:10:48 -07:00