Commit Graph

  • 716713f0e6 Update .gitignore to exclude build directory and model files prajwalc22 2025-04-16 09:52:30 +0530
  • 01caf379ba Update .gitignore to exclude build directory and model files prajwalc22 2025-04-15 08:21:19 +0530
  • 87a1c76578 Update CMake configuration and documentation for --prompt flag prajwalc22 2025-04-15 08:16:02 +0530
  • f3116d2577 Add --prompt flag for non-interactive mode prajwalc22 2025-04-12 13:22:48 +0530
  • 7164a5e844 Internal change. The gemma.cpp Authors 2025-04-12 20:27:14 -0700
  • 2e722f14f1 Add mmap support (not yet used) Jan Wassenberg 2025-04-10 10:02:58 -0700
  • 8532da47f7 Major refactor of allocator/args: Jan Wassenberg 2025-04-10 01:28:16 -0700
  • bef91a3f03 Merge pull request #529 from ufownl:refactor/wrap_and_tokenize Copybara-Service 2025-04-08 09:22:26 -0700
  • 5d4f7e0f7e Add new singleton Allocator2 instead of monostate Jan Wassenberg 2025-04-08 09:00:18 -0700
  • 4e6aa36e9b Minor cleanup: enable 0,0 Extents2D, add SerializedSpan typedef, include fixes Jan Wassenberg 2025-04-08 03:35:08 -0700
  • cc2e14e654 Improve `GemmaChatTemplate` to handle vision prompt wrapping RangerUFO 2025-03-27 15:57:53 +0800
  • c39295f497 Inline the ctor of `GemmaChatTemplate` RangerUFO 2025-03-27 14:01:56 +0800
  • d1615b56b2 Fix the prompt wrapping of gemma3-1b again RangerUFO 2025-03-26 18:27:09 +0800
  • ca4ee2b63f Refactor `WrapAndTokenize` to work properly with Gemma3 RangerUFO 2025-03-26 18:19:05 +0800
  • 76a81ac2d6 Fix unaligned buffer causing crash on GCC. Thanks @ufownl, fixes #508 Jan Wassenberg 2025-03-28 11:24:53 -0700
  • 304dc79430 Update runners to ubuntu-24.04 from deprecated ubuntu-20.04 label Bill Napier 2025-03-26 21:27:48 +0000
  • e55734219d Fix test threshold and improve warning output Jan Wassenberg 2025-03-26 06:10:50 -0700
  • 4a924f1794 Merge pull request #527 from ufownl:feature/gemma2_secondary_eos v0.1.4 Copybara-Service 2025-03-25 06:44:41 -0700
  • d42deaa27c Set the secondary EOS for Gemma2 RangerUFO 2025-03-21 19:53:32 +0800
  • 2bad79f110 Fix the EOS checking RangerUFO 2025-03-21 19:26:59 +0800
  • 6300c123ee Update app argument documentation Jan Wassenberg 2025-03-21 06:32:54 -0700
  • 05b1cce9f7 Add support for a secondary EOS token Phil Culliton 2025-03-20 12:27:44 -0700
  • b1032ebf5f Fix PromptWrapping for gemma3 1B, thanks @ufownl Jan Wassenberg 2025-03-20 05:06:45 -0700
  • 83219e3c68 Add note on attention length and SFP Jan Wassenberg 2025-03-20 00:38:33 -0700
  • 3d419ec173
    Merge pull request #523 from ufownl/bugfix/gemma3_1b_wrapping pculliton 2025-03-19 10:30:27 -0400
  • b16ce9a0b4 Fix the prompt wrapping of gemma3-1b RangerUFO 2025-03-18 16:52:38 +0800
  • 1b72c22345 Refactor Gemma ctor and improve pool NUMA support Jan Wassenberg 2025-03-14 10:18:11 -0700
  • 1b1b63d560 Fix PaliGemma models. v0.1.3 Phil Culliton 2025-03-13 06:27:52 -0700
  • 0ff6b3123a Point out Gemma 3 support in README.md Quirin Niedernhuber 2025-03-12 07:32:31 -0700
  • 5898fa5eb0 Update github actions/cache version Jan Wassenberg 2025-03-12 07:12:22 -0700
  • 4ab601da10 Internal change. Phil Culliton 2025-03-11 23:19:36 -0700
  • 9d83ff202e Internal change. Phil Culliton 2025-03-11 23:10:08 -0700
  • 7cdb0d3874 Internal change. Phil Culliton 2025-02-28 16:04:54 -0800
  • b00e8a7bcf naming scheme between gemma and gemma2 variants on the command line was not consistent The gemma.cpp Authors 2025-02-18 16:36:48 -0800
  • de5bab65b4 Use a set's `find` method when looking for reject tokens. The gemma.cpp Authors 2025-02-26 08:42:36 -0800
  • 2bdf26d81d Support bf16 output of Matmul Jan Wassenberg 2025-02-25 17:52:50 -0800
  • 1f916b686b Adds: - GemmaContext class that exposes Gemma functionality - C API that uses GemmaContext - C# interop class in GemmaInterop.cs - New END_OF_TURN_ID in tokenizer.h, useful when dealing with instruction-tuned prompts test_730754638 The gemma.cpp Authors 2025-02-24 23:59:12 -0800
  • b3b4b9f92f With new matmul, much larger batch sizes are advantageous, default to 256. Jan Wassenberg 2025-02-24 10:21:21 -0800
  • 9a2360d719 Move batch_bench into test section, add GTest dep. Fixes #501 Jan Wassenberg 2025-02-21 05:33:14 -0800
  • f9d93e4a42 Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning Jan Wassenberg 2025-02-20 08:32:52 -0800
  • ad8dd21e1d Internal change. Phil Culliton 2025-02-14 08:59:08 -0800
  • d854471ae2 Use vectorized TopK using highway VQSelect Apoorv Reddy 2025-02-18 05:00:53 -0800
  • 0e5b59d24d Implements FusedSoftmaxAndSampleTopK. Apoorv Reddy 2025-02-16 21:29:05 -0800
  • bdf5d25e97 Only temporarily enable spinning in threading benchmark Jan Wassenberg 2025-02-14 17:15:10 -0800
  • 06c70dccd9 Less verbose threading_test output, improve formatting. Jan Wassenberg 2025-02-13 00:55:55 -0800
  • f173aa776e Add conversion tool for HF safetensors to gemma.cpp for PaliGemma. Daniel Keysers 2025-02-12 03:46:56 -0800
  • c495b25995 Merge pull request #493 from ufownl:bugfix/compress_weights_le Copybara-Service 2025-02-11 05:10:13 -0800
  • 64cf6dfe0a Using TimingInfo methods and cleaning up args to DecodeStepT Apoorv Reddy 2025-02-11 04:47:39 -0800
  • 953c877658 Fix nuq Enc() to handle groups < kGroupSize. Jan Wassenberg 2025-02-10 07:17:10 -0800
  • 5563d94811 Add fork/join latency benchmark Jan Wassenberg 2025-02-10 05:23:08 -0800
  • 780e376023 Add KVCache.DeepCopy() . Will be useful for implementing sampling functionality like beam sampling, parallel sampling, CoT Decoding (à la https://arxiv.org/abs/2402.10200) Apoorv Reddy 2025-02-10 04:09:54 -0800
  • 9b3e7ea8a2 Factor out DecodeStepT from GenerateT into a separate function. Apoorv Reddy 2025-02-10 03:52:29 -0800
  • b0fe9a43e6 Further speed up blob_compare: single alloc, use dual sockets Jan Wassenberg 2025-02-09 10:53:23 -0800
  • 3a5a6dbcad Fix the link error when building `compress_weights` with Clang on macOS RangerUFO 2025-02-09 00:13:25 +0800
  • b18bd781f6 Windows build fixes: struct vs class, unused arg/var, avoid VLA, Deleter arg, casts Jan Wassenberg 2025-02-07 07:38:20 -0800
  • c822957fce Windows build fixes: struct vs class, unused arg/var, avoid VLA Jan Wassenberg 2025-02-06 23:00:47 -0800
  • 82ca526c0c Remove `srcs_version` and `python_version` attributes, as they already default to `"PY3"` Oleh Prypin 2025-02-06 16:50:42 -0800
  • f31e12e63b Improved blob diff: parallel, tolerance for float Jan Wassenberg 2025-02-06 13:45:47 -0800
  • 9f5159ff68 Public visibility for compression/ Jan Wassenberg 2025-02-05 08:53:19 -0800
  • 7ccc6abe87 Allow conversion, loading and inference with NUQ. Phil Culliton 2025-02-05 07:45:18 -0800
  • 8a6edff319 Base interleaved handling for 4.5-bit NUQ, specifically Enc, DecompressAndZeroPad, and Dec2. Includes tests. Phil Culliton 2025-01-31 10:34:57 -0800
  • c5c85e09fd
    Merge 123bf7eebb into 23dac72463 copybara-service[bot] 2025-01-29 19:58:46 +0000
  • 23dac72463 Simplified interface class and example for Gemma.cpp usage. Phil Culliton 2025-01-28 08:47:55 -0800
  • 7af2e70321 Add python wrappers for configs and inference. Enable building compression/python/compression_test using bazel. Add default image path for image_test and paligemma_test. Daniel Keysers 2025-01-28 08:21:24 -0800
  • bcdb0d65bd Assorted small cleanups. Daniel Keysers 2025-01-28 06:09:08 -0800
  • a248f76245 Allow overriding num threads despite detecting topology Jan Wassenberg 2025-01-27 08:57:08 -0800
  • e997468496 Apply PositionalEncodingQK always in-place. Daniel Keysers 2025-01-23 07:08:50 -0800
  • ce807a31a1 internal change Apoorv Reddy 2025-01-23 05:28:51 -0800
  • a60b564b88 Infra improvements (2) Jan Wassenberg 2025-01-23 01:54:50 -0800
  • f37402da57 Add parameter for base_frequency to CreateInvTimeScale(). Extract a few local variables to make code easier to read (hopefully). Daniel Keysers 2025-01-23 00:56:04 -0800
  • a133b3d062 Tiny fix: align template parameter order with parameter order. Daniel Keysers 2025-01-22 09:12:55 -0800
  • 9646edc908 Internal change Phil Culliton 2025-01-21 07:53:22 -0800
  • f46052b5b4 Merge pull request #473 from ufownl:bugfix/migrate_weights_target Copybara-Service 2025-01-20 08:05:38 -0800
  • c4398fc72d Infra improvements: Jan Wassenberg 2025-01-20 06:22:17 -0800
  • 20e5ef6d2e Add the missing `migrate_weights` target for CMake RangerUFO 2025-01-17 18:56:43 +0800
  • 493688f6f1 Allow interactive use with new single-file weight format. Add section about new weights format to README.md. Remove model_type_required parameter. Update error handling for flags. Daniel Keysers 2025-01-15 07:22:00 -0800
  • b93231a47d Moved the vit config fields to their own config struct Ray Smith 2025-01-15 01:09:16 -0800
  • 9d40f0117e Added ability to load/save a complete model file, including tokenizer. Ray Smith 2024-12-19 07:59:08 -0800
  • 29e3a1bba9
    Merge 51a708e957 into 5bc356f18f Nanubala Gnana Sai 2024-12-18 12:16:32 +0000
  • 5bc356f18f Internal change The gemma.cpp Authors 2024-12-17 15:15:21 -0800
  • 73766e8ee3 Small updates to the README file. Daniel Keysers 2024-12-17 04:09:17 -0800
  • 62c70d6715 Rename ModelTraining to PromptWrapping which is a more accurate name. Daniel Keysers 2024-12-13 07:45:25 -0800
  • 6254f2e5ca Removed duplicated tensor sizes from weights.h by changing the constructor used for MatPtrT Ray Smith 2024-12-11 06:29:57 -0800
  • aed17396be Make prompt wrapping more consistent and fix duplicated tokens for multi-turn. Do not echo <end_of_turn> tokens to the user. Have verbosity=0 only show the dialog. Daniel Keysers 2024-12-11 01:51:29 -0800
  • e69bc3bc1c Added the TensorInfo arg to the compressor so the shape and scale can be output correctly to the file in future. Corrected some errors in the TensorIndex. Ray Smith 2024-12-11 01:26:05 -0800
  • 7b77909427 Fix unhandled switch warning/error Jan Wassenberg 2024-12-10 13:32:21 -0800
  • 642fc97d51 Internal change Jan Wassenberg 2024-12-10 06:57:59 -0800
  • d8135e836f Merge pull request #460 from ericcurtin:common Copybara-Service 2024-12-10 06:33:37 -0800
  • 5bbe814a53 Tiny cleanup. Daniel Keysers 2024-12-10 03:33:25 -0800
  • 331d2ccc02 Add support for 448px resolution to PaliGemma and PaliGemma2. Daniel Keysers 2024-12-09 11:37:37 -0800
  • a971088ac2 Refactor `gemma/common.cc` to improve readability and safety Eric Curtin 2024-12-08 17:30:17 -0300
  • 278f2d148f Refactor `gemma/common.cc` to improve readability and safety Eric Curtin 2024-12-08 17:30:17 -0300
  • 66bb435121 No public description The gemma.cpp Authors 2024-12-09 00:48:59 -0800
  • 9dfe2a76be Internal change Phil Culliton 2024-12-04 20:41:07 -0800
  • 6a34e9c547 Print cache info and update Highway version for that Jan Wassenberg 2024-12-03 06:31:15 -0800
  • f74d496879 Threading/infra improvements. Jan Wassenberg 2024-11-27 01:11:20 -0800
  • 51a708e957
    Merge branch 'dev' into feature/ISS-60/implement-self-extend Nanubala Gnana Sai 2024-11-25 19:08:50 +0530
  • 109a4d9f85 Add a simple benchmark for batching. Stanko Novakovic 2024-11-21 10:59:16 -0800
  • 3d1625d8c5 Improved consistency of compressor API, and added a universal method with a target type arg. Moved configs pybind up to root level. Ray Smith 2024-11-21 05:27:02 -0800
  • e8601b2415
    Merge branch 'dev' into feature/ISS-60/implement-self-extend Nanubala Gnana Sai 2024-11-19 23:41:45 +0530