gemma.cpp

Commit Graph

Select branches

Hide Pull Requests

dev

main

test_648971168

test_655310563

test_683370269

test_686665933

test_695436079

test_730754638

test_802567656

test_822643310

test_824820179

test_825613752

test_835158854

test_835159997

test_835160876

test_836654012

test_840724686

test_841765739

test_845238905

test_845310753

#102

#105

#107

#108

#109

#110

#111

#113

#114

#115

#116

#117

#118

#119

#120

#122

#123

#124

#125

#126

#127

#128

#129

#130

#131

#132

#133

#136

#137

#138

#139

#14

#140

#141

#142

#144

#145

#146

#147

#148

#149

#150

#151

#154

#155

#156

#157

#157

#158

#159

#160

#162

#163

#165

#166

#167

#168

#168

#169

#170

#172

#173

#174

#175

#176

#177

#178

#179

#180

#181

#182

#183

#184

#186

#187

#188

#189

#190

#191

#192

#194

#195

#199

#200

#201

#202

#203

#204

#205

#206

#207

#208

#209

#210

#211

#212

#213

#214

#215

#216

#217

#218

#219

#22

#220

#222

#223

#224

#225

#226

#227

#228

#229

#230

#231

#232

#233

#234

#235

#236

#237

#238

#239

#24

#240

#241

#242

#243

#244

#245

#246

#247

#248

#249

#250

#250

#251

#253

#254

#255

#256

#257

#258

#259

#26

#260

#261

#262

#263

#264

#265

#266

#267

#268

#269

#270

#271

#272

#273

#274

#275

#277

#279

#280

#281

#282

#284

#286

#287

#287

#288

#289

#290

#291

#292

#293

#294

#295

#296

#297

#298

#299

#3

#300

#302

#303

#304

#305

#306

#307

#308

#309

#310

#311

#312

#313

#314

#315

#316

#317

#319

#32

#320

#321

#322

#323

#324

#325

#326

#327

#328

#329

#33

#330

#331

#333

#334

#335

#336

#337

#338

#339

#34

#342

#343

#344

#345

#346

#347

#348

#349

#35

#350

#351

#352

#353

#354

#355

#356

#357

#358

#359

#36

#360

#361

#362

#363

#364

#366

#367

#368

#369

#369

#371

#372

#374

#375

#376

#377

#378

#379

#38

#380

#381

#382

#383

#386

#387

#388

#389

#390

#391

#392

#393

#394

#395

#396

#397

#398

#399

#40

#400

#402

#403

#404

#405

#406

#407

#408

#409

#41

#410

#411

#412

#413

#413

#414

#415

#416

#417

#418

#419

#42

#420

#421

#422

#424

#425

#426

#427

#428

#430

#431

#431

#432

#433

#434

#435

#436

#437

#438

#439

#440

#441

#441

#442

#443

#444

#445

#446

#446

#447

#448

#449

#450

#451

#452

#453

#454

#455

#456

#457

#458

#459

#460

#461

#462

#463

#464

#465

#466

#467

#468

#469

#47

#470

#471

#472

#473

#474

#475

#476

#477

#478

#479

#480

#481

#482

#483

#484

#485

#486

#487

#488

#489

#490

#491

#492

#493

#494

#495

#496

#497

#498

#499

#500

#502

#503

#504

#505

#506

#507

#509

#510

#510

#511

#512

#513

#514

#515

#516

#517

#519

#520

#521

#523

#524

#525

#526

#527

#528

#529

#53

#530

#532

#534

#535

#536

#537

#538

#539

#541

#542

#545

#546

#547

#548

#549

#55

#550

#552

#553

#554

#555

#556

#557

#558

#559

#56

#561

#562

#563

#564

#565

#566

#567

#569

#570

#571

#572

#573

#574

#575

#576

#579

#58

#580

#581

#583

#584

#585

#586

#587

#588

#589

#590

#591

#592

#593

#594

#595

#596

#597

#598

#599

#6

#600

#601

#602

#603

#604

#605

#606

#607

#609

#609

#61

#610

#610

#611

#612

#613

#614

#615

#616

#617

#618

#619

#620

#621

#622

#624

#624

#626

#627

#628

#629

#63

#630

#633

#634

#635

#636

#637

#637

#638

#639

#640

#641

#642

#643

#644

#645

#646

#647

#649

#65

#650

#651

#652

#653

#654

#655

#656

#657

#658

#659

#66

#660

#663

#664

#666

#667

#668

#669

#67

#670

#671

#672

#673

#674

#675

#676

#677

#678

#679

#68

#680

#682

#683

#684

#685

#686

#687

#689

#69

#690

#690

#691

#692

#693

#694

#695

#696

#697

#698

#699

#700

#701

#702

#703

#704

#705

#706

#707

#708

#709

#71

#710

#711

#712

#713

#714

#715

#716

#717

#718

#719

#720

#721

#722

#723

#724

#725

#727

#728

#729

#730

#731

#732

#733

#734

#735

#736

#737

#738

#739

#739

#74

#740

#741

#742

#743

#744

#745

#746

#747

#747

#748

#749

#75

#750

#751

#752

#753

#754

#755

#755

#756

#757

#759

#76

#760

#761

#763

#764

#765

#766

#767

#768

#769

#77

#770

#770

#771

#772

#773

#774

#774

#775

#775

#776

#776

#777

#779

#78

#780

#781

#781

#782

#782

#783

#784

#785

#786

#787

#788

#789

#79

#790

#791

#792

#792

#793

#794

#794

#795

#796

#797

#797

#798

#799

#800

#801

#802

#803

#804

#805

#806

#807

#808

#809

#809

#81

#810

#810

#811

#811

#82

#83

#85

#86

#86

#87

#9

#92

#93

#94

#95

#96

#97

#98

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

716713f0e6 Update .gitignore to exclude build directory and model files prajwalc22 2025-04-16 09:52:30 +0530
01caf379ba Update .gitignore to exclude build directory and model files prajwalc22 2025-04-15 08:21:19 +0530
87a1c76578 Update CMake configuration and documentation for --prompt flag prajwalc22 2025-04-15 08:16:02 +0530
f3116d2577 Add --prompt flag for non-interactive mode prajwalc22 2025-04-12 13:22:48 +0530
7164a5e844 Internal change. The gemma.cpp Authors 2025-04-12 20:27:14 -0700
2e722f14f1 Add mmap support (not yet used) Jan Wassenberg 2025-04-10 10:02:58 -0700
8532da47f7 Major refactor of allocator/args: Jan Wassenberg 2025-04-10 01:28:16 -0700
bef91a3f03 Merge pull request #529 from ufownl:refactor/wrap_and_tokenize Copybara-Service 2025-04-08 09:22:26 -0700
5d4f7e0f7e Add new singleton Allocator2 instead of monostate Jan Wassenberg 2025-04-08 09:00:18 -0700
4e6aa36e9b Minor cleanup: enable 0,0 Extents2D, add SerializedSpan typedef, include fixes Jan Wassenberg 2025-04-08 03:35:08 -0700
cc2e14e654 Improve `GemmaChatTemplate` to handle vision prompt wrapping RangerUFO 2025-03-27 15:57:53 +0800
c39295f497 Inline the ctor of `GemmaChatTemplate` RangerUFO 2025-03-27 14:01:56 +0800
d1615b56b2 Fix the prompt wrapping of gemma3-1b again RangerUFO 2025-03-26 18:27:09 +0800
ca4ee2b63f Refactor `WrapAndTokenize` to work properly with Gemma3 RangerUFO 2025-03-26 18:19:05 +0800
76a81ac2d6 Fix unaligned buffer causing crash on GCC. Thanks @ufownl, fixes #508 Jan Wassenberg 2025-03-28 11:24:53 -0700
304dc79430 Update runners to ubuntu-24.04 from deprecated ubuntu-20.04 label Bill Napier 2025-03-26 21:27:48 +0000
e55734219d Fix test threshold and improve warning output Jan Wassenberg 2025-03-26 06:10:50 -0700
4a924f1794 Merge pull request #527 from ufownl:feature/gemma2_secondary_eos v0.1.4 Copybara-Service 2025-03-25 06:44:41 -0700
d42deaa27c Set the secondary EOS for Gemma2 RangerUFO 2025-03-21 19:53:32 +0800
2bad79f110 Fix the EOS checking RangerUFO 2025-03-21 19:26:59 +0800
6300c123ee Update app argument documentation Jan Wassenberg 2025-03-21 06:32:54 -0700
05b1cce9f7 Add support for a secondary EOS token Phil Culliton 2025-03-20 12:27:44 -0700
b1032ebf5f Fix PromptWrapping for gemma3 1B, thanks @ufownl Jan Wassenberg 2025-03-20 05:06:45 -0700
83219e3c68 Add note on attention length and SFP Jan Wassenberg 2025-03-20 00:38:33 -0700
3d419ec173

Merge pull request #523 from ufownl/bugfix/gemma3_1b_wrapping pculliton 2025-03-19 10:30:27 -0400
b16ce9a0b4 Fix the prompt wrapping of gemma3-1b RangerUFO 2025-03-18 16:52:38 +0800
1b72c22345 Refactor Gemma ctor and improve pool NUMA support Jan Wassenberg 2025-03-14 10:18:11 -0700
1b1b63d560 Fix PaliGemma models. v0.1.3 Phil Culliton 2025-03-13 06:27:52 -0700
0ff6b3123a Point out Gemma 3 support in README.md Quirin Niedernhuber 2025-03-12 07:32:31 -0700
5898fa5eb0 Update github actions/cache version Jan Wassenberg 2025-03-12 07:12:22 -0700
4ab601da10 Internal change. Phil Culliton 2025-03-11 23:19:36 -0700
9d83ff202e Internal change. Phil Culliton 2025-03-11 23:10:08 -0700
7cdb0d3874 Internal change. Phil Culliton 2025-02-28 16:04:54 -0800
b00e8a7bcf naming scheme between gemma and gemma2 variants on the command line was not consistent The gemma.cpp Authors 2025-02-18 16:36:48 -0800
de5bab65b4 Use a set's `find` method when looking for reject tokens. The gemma.cpp Authors 2025-02-26 08:42:36 -0800
2bdf26d81d Support bf16 output of Matmul Jan Wassenberg 2025-02-25 17:52:50 -0800
1f916b686b Adds: - GemmaContext class that exposes Gemma functionality - C API that uses GemmaContext - C# interop class in GemmaInterop.cs - New END_OF_TURN_ID in tokenizer.h, useful when dealing with instruction-tuned prompts test_730754638 The gemma.cpp Authors 2025-02-24 23:59:12 -0800
b3b4b9f92f With new matmul, much larger batch sizes are advantageous, default to 256. Jan Wassenberg 2025-02-24 10:21:21 -0800
9a2360d719 Move batch_bench into test section, add GTest dep. Fixes #501 Jan Wassenberg 2025-02-21 05:33:14 -0800
f9d93e4a42 Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning Jan Wassenberg 2025-02-20 08:32:52 -0800
ad8dd21e1d Internal change. Phil Culliton 2025-02-14 08:59:08 -0800
d854471ae2 Use vectorized TopK using highway VQSelect Apoorv Reddy 2025-02-18 05:00:53 -0800
0e5b59d24d Implements FusedSoftmaxAndSampleTopK. Apoorv Reddy 2025-02-16 21:29:05 -0800
bdf5d25e97 Only temporarily enable spinning in threading benchmark Jan Wassenberg 2025-02-14 17:15:10 -0800
06c70dccd9 Less verbose threading_test output, improve formatting. Jan Wassenberg 2025-02-13 00:55:55 -0800
f173aa776e Add conversion tool for HF safetensors to gemma.cpp for PaliGemma. Daniel Keysers 2025-02-12 03:46:56 -0800
c495b25995 Merge pull request #493 from ufownl:bugfix/compress_weights_le Copybara-Service 2025-02-11 05:10:13 -0800
64cf6dfe0a Using TimingInfo methods and cleaning up args to DecodeStepT Apoorv Reddy 2025-02-11 04:47:39 -0800
953c877658 Fix nuq Enc() to handle groups < kGroupSize. Jan Wassenberg 2025-02-10 07:17:10 -0800
5563d94811 Add fork/join latency benchmark Jan Wassenberg 2025-02-10 05:23:08 -0800
780e376023 Add KVCache.DeepCopy() . Will be useful for implementing sampling functionality like beam sampling, parallel sampling, CoT Decoding (à la https://arxiv.org/abs/2402.10200) Apoorv Reddy 2025-02-10 04:09:54 -0800
9b3e7ea8a2 Factor out DecodeStepT from GenerateT into a separate function. Apoorv Reddy 2025-02-10 03:52:29 -0800
b0fe9a43e6 Further speed up blob_compare: single alloc, use dual sockets Jan Wassenberg 2025-02-09 10:53:23 -0800
3a5a6dbcad Fix the link error when building `compress_weights` with Clang on macOS RangerUFO 2025-02-09 00:13:25 +0800
b18bd781f6 Windows build fixes: struct vs class, unused arg/var, avoid VLA, Deleter arg, casts Jan Wassenberg 2025-02-07 07:38:20 -0800
c822957fce Windows build fixes: struct vs class, unused arg/var, avoid VLA Jan Wassenberg 2025-02-06 23:00:47 -0800
82ca526c0c Remove `srcs_version` and `python_version` attributes, as they already default to `"PY3"` Oleh Prypin 2025-02-06 16:50:42 -0800
f31e12e63b Improved blob diff: parallel, tolerance for float Jan Wassenberg 2025-02-06 13:45:47 -0800
9f5159ff68 Public visibility for compression/ Jan Wassenberg 2025-02-05 08:53:19 -0800
7ccc6abe87 Allow conversion, loading and inference with NUQ. Phil Culliton 2025-02-05 07:45:18 -0800
8a6edff319 Base interleaved handling for 4.5-bit NUQ, specifically Enc, DecompressAndZeroPad, and Dec2. Includes tests. Phil Culliton 2025-01-31 10:34:57 -0800
c5c85e09fd

Merge 123bf7eebb into 23dac72463 copybara-service[bot] 2025-01-29 19:58:46 +0000
23dac72463 Simplified interface class and example for Gemma.cpp usage. Phil Culliton 2025-01-28 08:47:55 -0800
7af2e70321 Add python wrappers for configs and inference. Enable building compression/python/compression_test using bazel. Add default image path for image_test and paligemma_test. Daniel Keysers 2025-01-28 08:21:24 -0800
bcdb0d65bd Assorted small cleanups. Daniel Keysers 2025-01-28 06:09:08 -0800
a248f76245 Allow overriding num threads despite detecting topology Jan Wassenberg 2025-01-27 08:57:08 -0800
e997468496 Apply PositionalEncodingQK always in-place. Daniel Keysers 2025-01-23 07:08:50 -0800
ce807a31a1 internal change Apoorv Reddy 2025-01-23 05:28:51 -0800
a60b564b88 Infra improvements (2) Jan Wassenberg 2025-01-23 01:54:50 -0800
f37402da57 Add parameter for base_frequency to CreateInvTimeScale(). Extract a few local variables to make code easier to read (hopefully). Daniel Keysers 2025-01-23 00:56:04 -0800
a133b3d062 Tiny fix: align template parameter order with parameter order. Daniel Keysers 2025-01-22 09:12:55 -0800
9646edc908 Internal change Phil Culliton 2025-01-21 07:53:22 -0800
f46052b5b4 Merge pull request #473 from ufownl:bugfix/migrate_weights_target Copybara-Service 2025-01-20 08:05:38 -0800
c4398fc72d Infra improvements: Jan Wassenberg 2025-01-20 06:22:17 -0800
20e5ef6d2e Add the missing `migrate_weights` target for CMake RangerUFO 2025-01-17 18:56:43 +0800
493688f6f1 Allow interactive use with new single-file weight format. Add section about new weights format to README.md. Remove model_type_required parameter. Update error handling for flags. Daniel Keysers 2025-01-15 07:22:00 -0800
b93231a47d Moved the vit config fields to their own config struct Ray Smith 2025-01-15 01:09:16 -0800
9d40f0117e Added ability to load/save a complete model file, including tokenizer. Ray Smith 2024-12-19 07:59:08 -0800
29e3a1bba9

Merge 51a708e957 into 5bc356f18f Nanubala Gnana Sai 2024-12-18 12:16:32 +0000
5bc356f18f Internal change The gemma.cpp Authors 2024-12-17 15:15:21 -0800
73766e8ee3 Small updates to the README file. Daniel Keysers 2024-12-17 04:09:17 -0800
62c70d6715 Rename ModelTraining to PromptWrapping which is a more accurate name. Daniel Keysers 2024-12-13 07:45:25 -0800
6254f2e5ca Removed duplicated tensor sizes from weights.h by changing the constructor used for MatPtrT Ray Smith 2024-12-11 06:29:57 -0800
aed17396be Make prompt wrapping more consistent and fix duplicated tokens for multi-turn. Do not echo <end_of_turn> tokens to the user. Have verbosity=0 only show the dialog. Daniel Keysers 2024-12-11 01:51:29 -0800
e69bc3bc1c Added the TensorInfo arg to the compressor so the shape and scale can be output correctly to the file in future. Corrected some errors in the TensorIndex. Ray Smith 2024-12-11 01:26:05 -0800
7b77909427 Fix unhandled switch warning/error Jan Wassenberg 2024-12-10 13:32:21 -0800
642fc97d51 Internal change Jan Wassenberg 2024-12-10 06:57:59 -0800
d8135e836f Merge pull request #460 from ericcurtin:common Copybara-Service 2024-12-10 06:33:37 -0800
5bbe814a53 Tiny cleanup. Daniel Keysers 2024-12-10 03:33:25 -0800
331d2ccc02 Add support for 448px resolution to PaliGemma and PaliGemma2. Daniel Keysers 2024-12-09 11:37:37 -0800
a971088ac2 Refactor `gemma/common.cc` to improve readability and safety Eric Curtin 2024-12-08 17:30:17 -0300
278f2d148f Refactor `gemma/common.cc` to improve readability and safety Eric Curtin 2024-12-08 17:30:17 -0300
66bb435121 No public description The gemma.cpp Authors 2024-12-09 00:48:59 -0800
9dfe2a76be Internal change Phil Culliton 2024-12-04 20:41:07 -0800
6a34e9c547 Print cache info and update Highway version for that Jan Wassenberg 2024-12-03 06:31:15 -0800
f74d496879 Threading/infra improvements. Jan Wassenberg 2024-11-27 01:11:20 -0800
51a708e957

Merge branch 'dev' into feature/ISS-60/implement-self-extend Nanubala Gnana Sai 2024-11-25 19:08:50 +0530
109a4d9f85 Add a simple benchmark for batching. Stanko Novakovic 2024-11-21 10:59:16 -0800
3d1625d8c5 Improved consistency of compressor API, and added a universal method with a target type arg. Moved configs pybind up to root level. Ray Smith 2024-11-21 05:27:02 -0800
e8601b2415

Merge branch 'dev' into feature/ISS-60/implement-self-extend Nanubala Gnana Sai 2024-11-19 23:41:45 +0530