Commit Graph

  • fea9a07d9b Avoid affinity related warnings on Apple. Refs #625 Jan Wassenberg 2025-07-03 08:21:52 -0700
  • e1585ecaf5 Update Highway version to get NEON bf16 fix Jan Wassenberg 2025-06-23 01:24:25 -0700
  • a04cc287b2 Move MatMulEnv out of Gemma to enable concurrent calls Jan Wassenberg 2025-06-23 01:19:39 -0700
  • 0f70f285e0 1.1x prefill and decode speedup (attention/activations) Jan Wassenberg 2025-06-20 08:59:23 -0700
  • 7630ec0c92 batch_bench tweak: more output Jan Wassenberg 2025-06-20 06:08:36 -0700
  • 4f5785b0fd Update instrumentation for new Highway wall-time profiler Jan Wassenberg 2025-06-19 07:45:30 -0700
  • 1665ecc5c2 Remove CMake max version, fixes #623 Jan Wassenberg 2025-06-19 02:29:35 -0700
  • 9e47940dde
    Merge 73863c1c10 into 834cbe5b39 copybara-service[bot] 2025-06-19 09:19:58 +0000
  • 73863c1c10 Remove CMake max version, fixes #623 Jan Wassenberg 2025-06-19 02:17:32 -0700
  • 834cbe5b39 linkstatic in most tests/binaries, remove fully_static_link Jan Wassenberg 2025-06-19 01:45:10 -0700
  • 7f62c2606e Fix bf16 KV recompression and Rope(), fixes #608 Jan Wassenberg 2025-06-18 09:13:47 -0700
  • 88284387db Reduce warning noise. Biruk Mammo 2025-06-18 09:00:47 -0700
  • 343482c7ef 1.02x batch decode speedup: BF16 KV cache Jan Wassenberg 2025-06-17 23:21:24 -0700
  • 606e22155a Gemma CPP: move PaliGemma tests' helper to a separate class Mukund Aggarwal 2025-06-17 18:36:52 -0700
  • f2adbfbcab Batch inference fixes: set pos during prefill, fix assert Jan Wassenberg 2025-06-17 07:09:00 -0700
  • d342e4e7d4 Also add CMAKE_CXX_STANDARD in examples' CMake files Jan Wassenberg 2025-06-17 06:53:02 -0700
  • cd80d8b24d Speed up builds by skipping rarely used targets Jan Wassenberg 2025-06-17 05:43:25 -0700
  • 9a02d6be68 Add --prompt_file and testdata for it. Refs #608 Jan Wassenberg 2025-06-16 23:40:28 -0700
  • 31d2b231af Update PaliGemma Kaggle link to point to v2 Jan Wassenberg 2025-06-16 23:24:22 -0700
  • 5f3797f6e1 Allow creating empty `AttentionActivations` for experimental code. Biruk Mammo 2025-06-16 10:18:35 -0700
  • 6773e4517c Split Activations into Griffin/Attention to reduce memory usage for attention-only tests. Jan Wassenberg 2025-06-16 07:52:27 -0700
  • 2128d076db Merge pull request #612 from ufownl:feature/allqueries_append Copybara-Service 2025-06-16 06:52:43 -0700
  • 7aac765e96 Add `Append` method to `AllQueries` RangerUFO 2025-06-16 20:39:27 +0800
  • e5c81f64a1 Major refactor: clarify query_idx (global) vs qi. Refs #607 Jan Wassenberg 2025-06-16 02:41:30 -0700
  • f654239302
    Merge 035a6ee61e into 2c72ff2aa5 copybara-service[bot] 2025-06-16 09:31:20 +0000
  • 035a6ee61e Major refactor: clarify query_idx (global) vs qi. Refs #607 Jan Wassenberg 2025-06-12 03:58:48 -0700
  • 2c72ff2aa5 Fix MatMul issue caused by autotuning bucketing, refs #608, thanks @ufownl Jan Wassenberg 2025-06-13 06:58:05 -0700
  • b30a1d260e
    Merge a67ba08cde into 01cdefeda7 copybara-service[bot] 2025-06-13 13:48:12 +0000
  • a67ba08cde Fix MatMul issue caused by autotuning bucketing, refs #608, thanks @ufownl Jan Wassenberg 2025-06-13 04:50:48 -0700
  • 5abacb8f4a Move `pos += 1` into `StreamAndUpdateEOS` RangerUFO 2025-06-12 12:50:55 +0800
  • 35abb3f1fd Fix the issue of duplicate `pos` RangerUFO 2025-06-12 00:08:48 +0800
  • 01cdefeda7 1.64x batch=1 prefill speedup: nested parallelization for Attention Jan Wassenberg 2025-06-11 11:28:14 -0700
  • c027a45a2e MatPtr-ify KV, shared div_seq_len, --seq_len flag Jan Wassenberg 2025-06-11 09:48:48 -0700
  • bd98b43cea Rename RowPtr->StridedView, CRows->RowPtrs Jan Wassenberg 2025-06-11 02:29:41 -0700
  • b84149310b Fix paligemma, update its test Jan Wassenberg 2025-06-11 02:11:29 -0700
  • ec02726cf7 6x large-batch, short-prompt prefill speedup Jan Wassenberg 2025-06-10 09:55:51 -0700
  • d7b23d532a Restructure internal initialization. Daniel Keysers 2025-06-10 01:24:52 -0700
  • 824a95793c Fix Image::WriteBinary() writing values to a file one at a time. Rhett Stucki 2025-06-06 00:47:29 -0700
  • 6ee628ba38 Further cleanup: separate MatMulEnv arg Jan Wassenberg 2025-06-05 20:47:57 -0700
  • e774ddbaaa Github test: disable failing ubuntu-20.04 Jan Wassenberg 2025-06-05 10:30:00 -0700
  • 0e2cab5187 Avoid warning about inability to map, unless explicitly requested Jan Wassenberg 2025-06-05 09:09:26 -0700
  • 3a266c662c Split gemma-inl into separate source files Jan Wassenberg 2025-06-05 05:36:08 -0700
  • dd7d4a7717 Optimize Image::GetPatch() to copy rows instead of pixels at a time. The gemma.cpp Authors 2025-06-04 22:30:34 -0700
  • eff0213e88 Merge pull request #593 from ufownl:bugfix/dc2bf16 Copybara-Service 2025-06-04 05:21:54 -0700
  • a82f8d5690 Fix compilation error on G++ 9.4 RangerUFO 2025-06-04 17:39:37 +0800
  • 6897313080 3x speedup of EmbedImagePatches - GEMM, not GEMV. Jan Wassenberg 2025-06-04 01:18:20 -0700
  • 9f74a1a098 Fix a problem in run_example.py Daniel Keysers 2025-06-04 00:42:15 -0700
  • 9efdcfd45c 1.07x batch decode speedup: more BF16 weights and activations Jan Wassenberg 2025-06-03 23:28:57 -0700
  • 839a642992 Fix paligemma_test, refs #588 Jan Wassenberg 2025-06-03 04:44:50 -0700
  • 209009b57e Merge pull request #588 from ufownl:bugfix/vit_attn Copybara-Service 2025-06-03 00:43:30 -0700
  • ad3002a21c
    Merge branch 'dev' into bugfix/vit_attn Jan Wassenberg 2025-06-03 09:29:52 +0200
  • 794a21a4e6 Major refactor to de-templatize gemma-inl and weights Jan Wassenberg 2025-06-02 23:00:47 -0700
  • 93de2be938 Fix the broken VitAttention RangerUFO 2025-06-03 12:37:18 +0800
  • cf4d7ceb82 1.16x decode speedup: remove last MatVec in Attention Jan Wassenberg 2025-06-02 09:39:57 -0700
  • c4a75abe43 Cleanup gemma_batch_bench Jan Wassenberg 2025-06-02 07:03:59 -0700
  • a3f7bf0991 Fix thread name when skipping packages/clusters Jan Wassenberg 2025-06-01 23:49:35 -0700
  • 0023ff8770 Add support for arbitrary output row pointers Jan Wassenberg 2025-05-31 10:55:12 -0700
  • 9c3e089b09 Internal change. The gemma.cpp Authors 2025-05-30 09:18:08 -0700
  • 1e8642f8f4 Internal change. The gemma.cpp Authors 2025-05-29 22:50:42 -0700
  • 3890eb5412 Remove backprop/ Jan Wassenberg 2025-05-28 07:00:44 -0700
  • 627cc04db9 Decouple MatMul from gemma-inl: precompile for all input types Jan Wassenberg 2025-05-27 07:08:23 -0700
  • 421a2ab8ac Add comments explaining non-padded tensors, kNoPad -> kPacked Jan Wassenberg 2025-05-26 03:03:05 -0700
  • eb8a463038 Merge pull request #574 from ufownl:bugfix/vit_weights Copybara-Service 2025-05-22 07:04:53 -0700
  • 2771f463f9 Fix the ViT weights loading RangerUFO 2025-05-22 12:13:29 +0800
  • 1ce89788ef Merge pull request #573 from ufownl:bugfix/vit Copybara-Service 2025-05-21 01:58:00 -0700
  • 6debdbe341 Minor fixes for ViT RangerUFO 2025-05-20 14:20:57 +0800
  • cb188d4a0e Fix RowT issue and improve Griffin (currently still broken) Jan Wassenberg 2025-05-19 07:01:29 -0700
  • d6cfabc2c1 Shorten gemma_test so we can run it for more models. Jan Wassenberg 2025-05-16 11:14:01 -0700
  • e890d46f30 1.31x batch prefill, 1.24x batch decode speedup: NUMA binding Jan Wassenberg 2025-05-16 07:41:36 -0700
  • c443adee33 3.8x speedup of weights loading via preadv on Linux Jan Wassenberg 2025-05-15 11:54:38 -0700
  • 38a08d8095 Replace last ConstMat with MatPtr Jan Wassenberg 2025-05-13 10:54:48 -0700
  • 0a6a7e4cd6 Merge pull request #566 from ufownl:bugfix/deduced_model_wrapping Copybara-Service 2025-05-13 10:28:16 -0700
  • 30ad625f42 Fix the wrapping field of the deduced model config RangerUFO 2025-05-13 17:06:38 +0800
  • 8a312e9b89 Split W1/W2 as a load-time preprocess. Jan Wassenberg 2025-05-13 07:39:16 -0700
  • 2038dfd9cc Minor: rename compression/shared -> types.h Jan Wassenberg 2025-05-13 06:52:46 -0700
  • d538a6d6c6 Cleanup: remove unused kCyclic, remove 2 suffix Jan Wassenberg 2025-05-13 01:05:42 -0700
  • ba21e3beb4 Adds a `GemmaAttention` constructor that takes an explicit `ThreadingContext`. Biruk Mammo 2025-05-12 11:16:29 -0700
  • 45ad847a41 Replace RowVectorBatch with MatStorageT Jan Wassenberg 2025-05-12 09:15:03 -0700
  • cf7dd80c17 Minor: mark command line flags as required Jan Wassenberg 2025-05-12 08:30:04 -0700
  • 37f2c3b951 أضف ميزة ydbgh232l2ukpxdkouazxbv5 ABRAHEM 2025-05-10 07:50:36 +0300
  • 252a4e955e Remove support for Gemma 1 and PaliGemma 1 models, superseded by (Pali)Gemma 2. Jan Wassenberg 2025-05-09 02:16:55 -0700
  • d834c07042 Exposes `GemmaAttention::DotSoftmaxWeightedSum` for experimentation. Biruk Mammo 2025-05-08 09:18:02 -0700
  • a0ff98ea60 Entirely remove constexpr on PaddedDirEnd. Refs #551 Jan Wassenberg 2025-05-07 12:47:40 -0700
  • d9d1709df8 Updates stale references to `compression/migrate_weights`. Biruk Mammo 2025-05-07 11:33:23 -0700
  • 20757046db cleanup, new conversation methods, bugfixes The gemma.cpp Authors 2025-05-07 08:52:04 -0700
  • e9ecb7794d Fix gcc build error and gemma3 crash, thanks @ufownl, fixes #551 Jan Wassenberg 2025-05-07 00:58:45 -0700
  • c8d92948f4 Move fields, io* and blob* from compression/ into io/ Jan Wassenberg 2025-05-06 11:16:24 -0700
  • 275135d7e8 Rename-only: remove Allocator2 etc suffixes now that refactoring is complete Jan Wassenberg 2025-05-06 09:12:05 -0700
  • 8d0882b966 Huge refactor of weight handling and model loading. Jan Wassenberg 2025-05-06 04:43:48 -0700
  • a3caf6e5d2 Add summary of optimizations/infra present in the repository Jan Wassenberg 2025-05-05 01:45:25 -0700
  • fe80f10ed7 Backprop test fixes and allocator cleanup Jan Wassenberg 2025-04-29 03:00:32 -0700
  • 160a5824fb Cleanup: include fixes/comments, fix leak, vector reserve Jan Wassenberg 2025-04-22 12:01:00 -0700
  • ba10c88a94 Add C API and C# interop files The gemma.cpp Authors 2025-04-22 10:35:12 -0700
  • f20da328de Merge pull request #539 from prajwalc22:feature-prompt-flag Copybara-Service 2025-04-22 03:09:19 -0700
  • 2407150f84 Merge branch 'feature-prompt-flag' of github.com:prajwalc22/gemma.cpp into feature-prompt-flag prajwalc22 2025-04-17 23:54:46 +0530
  • a9e56c27eb removed unnecessary threading.h import prajwalc22 2025-04-17 23:44:23 +0530
  • 09dfb144c0
    Merge branch 'dev' into feature-prompt-flag Prajwal Choudhari 2025-04-17 18:53:28 +0530
  • f55c321397 Address review feedback: Fix prefill_tbatch_size and variable placement issues prajwalc22 2025-04-17 10:15:21 +0530
  • 27c28cc938 Address review feedback: Fix prefill_tbatch_size and variable placement issues prajwalc22 2025-04-17 10:15:05 +0530
  • 87a658b1c6 Minor cleanup, on-demand NUQ buffer allocation Jan Wassenberg 2025-04-16 10:48:56 -0700