gemma.cpp

Commit Graph

Select branches

Hide Pull Requests

dev

main

test_648971168

test_683370269

test_824820179

test_836654012

test_840724686

test_841765739

test_875649021

test_882132164

test_897620425

#102

#105

#107

#108

#109

#110

#111

#113

#114

#115

#116

#117

#118

#119

#120

#122

#123

#124

#125

#126

#127

#128

#129

#130

#131

#132

#133

#136

#137

#138

#139

#14

#140

#141

#142

#144

#145

#146

#147

#148

#149

#150

#151

#154

#155

#156

#157

#157

#158

#159

#160

#162

#163

#165

#166

#167

#168

#168

#169

#170

#172

#173

#174

#175

#176

#177

#178

#179

#180

#181

#182

#183

#184

#186

#187

#188

#189

#190

#191

#192

#194

#195

#199

#200

#201

#202

#203

#204

#205

#206

#207

#208

#209

#210

#211

#212

#213

#214

#215

#216

#217

#218

#219

#22

#220

#222

#223

#224

#225

#226

#227

#228

#229

#230

#231

#232

#233

#234

#235

#236

#237

#238

#239

#24

#240

#241

#242

#243

#244

#245

#246

#247

#248

#249

#250

#250

#251

#253

#254

#255

#256

#257

#258

#259

#26

#260

#261

#262

#263

#264

#265

#266

#267

#268

#269

#270

#271

#272

#273

#274

#275

#277

#279

#280

#281

#282

#284

#286

#287

#287

#288

#289

#290

#291

#292

#293

#294

#295

#296

#297

#298

#299

#3

#300

#302

#303

#304

#305

#306

#307

#308

#309

#310

#311

#312

#313

#314

#315

#316

#317

#319

#32

#320

#321

#322

#323

#324

#325

#326

#327

#328

#329

#33

#330

#331

#333

#334

#335

#336

#337

#338

#339

#34

#342

#343

#344

#345

#346

#347

#348

#349

#35

#350

#351

#352

#353

#354

#355

#356

#357

#358

#359

#36

#360

#361

#362

#363

#364

#366

#367

#368

#369

#371

#372

#374

#375

#376

#377

#378

#379

#38

#380

#381

#382

#383

#386

#387

#388

#389

#390

#391

#392

#393

#394

#395

#396

#397

#398

#399

#40

#400

#402

#403

#404

#405

#406

#407

#408

#409

#41

#410

#411

#412

#413

#413

#414

#415

#416

#417

#418

#419

#42

#420

#421

#422

#424

#425

#426

#427

#428

#430

#431

#431

#432

#433

#434

#435

#436

#437

#438

#439

#440

#441

#442

#443

#444

#445

#446

#447

#448

#449

#450

#451

#452

#453

#454

#455

#456

#457

#458

#459

#460

#461

#462

#463

#464

#465

#466

#467

#468

#469

#47

#470

#471

#472

#473

#474

#475

#476

#477

#478

#479

#480

#481

#482

#483

#484

#485

#486

#487

#488

#489

#490

#491

#492

#493

#494

#495

#496

#497

#498

#499

#500

#502

#503

#504

#505

#506

#507

#509

#510

#511

#512

#513

#514

#515

#516

#517

#519

#520

#521

#523

#524

#525

#526

#527

#528

#529

#53

#530

#532

#534

#535

#536

#537

#538

#539

#541

#542

#545

#546

#547

#548

#549

#55

#550

#552

#553

#554

#555

#556

#557

#558

#559

#56

#561

#562

#563

#564

#565

#566

#567

#569

#570

#571

#572

#573

#574

#575

#576

#579

#58

#580

#581

#583

#584

#585

#586

#587

#588

#589

#590

#591

#592

#593

#594

#595

#596

#597

#598

#599

#6

#600

#601

#602

#603

#604

#605

#606

#607

#609

#609

#61

#610

#610

#611

#612

#613

#614

#615

#616

#617

#618

#619

#620

#621

#622

#624

#624

#626

#627

#628

#629

#63

#630

#633

#634

#635

#636

#637

#637

#638

#639

#640

#641

#642

#643

#644

#645

#646

#647

#649

#65

#650

#651

#652

#653

#654

#655

#656

#657

#658

#659

#66

#660

#663

#664

#666

#667

#668

#669

#67

#670

#671

#672

#673

#674

#675

#676

#677

#678

#679

#68

#680

#682

#683

#684

#685

#686

#687

#689

#69

#690

#691

#692

#693

#694

#695

#696

#697

#698

#699

#700

#701

#702

#703

#704

#705

#706

#707

#708

#709

#71

#710

#711

#712

#713

#714

#715

#716

#717

#718

#719

#720

#721

#722

#723

#724

#725

#727

#728

#729

#730

#731

#732

#733

#734

#735

#736

#737

#738

#739

#74

#740

#741

#742

#743

#744

#745

#746

#747

#747

#748

#749

#75

#750

#751

#752

#753

#754

#755

#756

#757

#759

#76

#760

#761

#763

#764

#765

#766

#767

#768

#769

#77

#770

#770

#771

#772

#773

#774

#775

#776

#777

#779

#78

#780

#781

#781

#782

#783

#784

#785

#786

#787

#788

#789

#79

#790

#791

#792

#793

#794

#794

#795

#796

#797

#798

#799

#800

#801

#802

#803

#804

#805

#806

#807

#808

#809

#81

#810

#811

#812

#813

#814

#815

#816

#818

#819

#82

#820

#821

#822

#823

#824

#825

#827

#828

#829

#83

#830

#831

#832

#833

#834

#835

#836

#837

#838

#839

#840

#841

#842

#843

#844

#845

#846

#847

#847

#848

#849

#85

#850

#851

#852

#854

#855

#856

#857

#858

#859

#86

#86

#860

#861

#862

#863

#864

#865

#865

#866

#867

#868

#868

#869

#87

#870

#871

#872

#873

#874

#876

#877

#878

#879

#880

#881

#882

#883

#884

#886

#889

#890

#891

#892

#893

#894

#895

#896

#897

#899

#899

#9

#92

#93

#94

#95

#96

#97

#98

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

cc1d256cff

Update CMakePresets.json Hitesh K V 2025-10-16 12:08:29 +0530
9b6ed1a58f gemma_batch_bench: generate more unique prompts Jan Wassenberg 2025-10-15 15:45:27 -0700
503aaddd65 Add 8-bit integer quantization (I8Stream) to Gemma.cpp. Phil Culliton 2025-10-15 09:24:38 -0700
ee18916abf Removed the PROFILER_ZONE from the most highly called functions to reduce the overhead. Ray Smith 2025-10-15 07:09:32 -0700
e3e8511e79 Initialization of profiler zones. Ray Smith 2025-10-15 03:05:30 -0700
fb6fa793f4 Added a global (to gemma) zones list to enable most call sites to PROFILER_ZONE3 to avoid the sychronization required for the static const initialization of the zone handle. Improved flash_attention to enable profiling using the new zones. Ray Smith 2025-10-14 08:30:23 -0700
3e9bb7df80

Update README.md Hitesh K V 2025-10-10 11:33:09 +0530
035273c184 tune pool kSpin mode in threading_context Jan Wassenberg 2025-10-07 08:35:44 -0700
9dc802c7aa Add logging to io.cc on failed write and read. Nitin Gangahar 2025-10-06 10:25:07 -0700
684a0444e9 Reduced parallelism for TransposeQ, making each thread read and write within its own cache lines Ray Smith 2025-10-02 08:14:37 -0700
277f396710 Reduced parallelism for TransposeQ, making each thread read and write within its own cache lines Ray Smith 2025-10-02 05:00:19 -0700
14244664c8 Avoid transposing Q when it isn't needed Ray Smith 2025-10-02 05:16:03 -0700
fe5a39990e Improve FlashAttention threading: Jan Wassenberg 2025-10-02 02:36:29 -0700
6098a022b3 Increased parallelism for RMSNormAndPositionalEncoding Ray Smith 2025-10-01 07:10:40 -0700
2f6cbde8ff Added a smaller tile size to flash attention for smaller batch sizes Ray Smith 2025-09-30 05:48:50 -0700
4974f24832 Fixed bug with softcap in single flash attention Ray Smith 2025-09-30 02:17:18 -0700
16536996d1 Remove less useful spammy log lines. Nitin Gangahar 2025-09-29 02:28:04 -0700
667a3f117a Utilize multiple cores to read weight batches. Nitin Gangahar 2025-09-26 11:27:56 -0700
d15731d201 Used hn::BroadcastLane instead of Set(..., x.raw) Ray Smith 2025-09-25 09:41:30 -0700
4f0c633248 (1) Added QueryResultAndMetrics and BatchQueryModelWithMetrics to also return TimingInfo besides query results. Charles Zhao 2025-09-23 17:01:56 -0700
fac8aac4cb Internal change Jan Wassenberg 2025-09-22 05:36:32 -0700
501fdf000e Remove no longer used MatVec Jan Wassenberg 2025-09-19 09:02:44 -0700
b603425bf3 Fix batch inference: dangling reference Jan Wassenberg 2025-09-16 08:01:21 -0700
f3bc1c17da 1.03x speedup: fused FFN Jan Wassenberg 2025-09-15 10:25:59 -0700
59db30e209 add const restriction for benchmark_helper.cc, and paligemma_helper.cc to remove a few uncessary copies. Charles Zhao 2025-09-14 16:26:55 -0700
c9b8479f7d Added zero-initialization to att_out. Re-enabled flash attention when HWY_NATIVE_DOT_BF16 is not available. Ray Smith 2025-09-12 07:47:36 -0700
2695aab5d2 Temporarily disable flash pending msan fix Jan Wassenberg 2025-09-10 07:25:07 -0700
ba6131311a Fix gemma_batch_bench for flash attention Jan Wassenberg 2025-09-10 05:32:03 -0700
9457258330 Refactor MatMul to accept views in the kernel functions Jan Wassenberg 2025-09-09 22:09:09 -0700
f10ac41a20 Added flash attention, with both a single-q function, and a register-tiled function. The register-tiled version achieves a speed-up by a factor of about 9.7 over the previous attention function on an AVX3-enabled machine. Ray Smith 2025-09-09 08:04:45 -0700
24b1760f03 Refactor: move Worker to ThreadingContext, factor out MMDecompress Jan Wassenberg 2025-09-09 07:55:39 -0700
461a9c7d1b Matmul refactoring towards fusion Jan Wassenberg 2025-09-09 07:13:03 -0700
34ceee6c30 Update MatMul comments, removing mention of partial. Jan Wassenberg 2025-09-09 05:56:57 -0700
a5ab99e4ba Memory use reduction: smaller/single MMStorage Jan Wassenberg 2025-09-09 05:32:20 -0700
06e5da1e22 Cleanup: split CacheInfo from Allocator, MatMul helper functions Jan Wassenberg 2025-09-08 02:23:29 -0700
6e52a835c6 Faster startup on tsan: use hierarchical parallelism for BF16 conversion Jan Wassenberg 2025-09-07 22:50:01 -0700
cbe24eac51 1.15x speedup: parallel sampling, enabled by new RNG Jan Wassenberg 2025-09-05 07:23:33 -0700
ad7d7a2713 Further adjust dot_test threshold (numerics) Jan Wassenberg 2025-09-05 05:49:35 -0700
2b4c16e243 Remove Griffin support Jan Wassenberg 2025-09-05 02:34:54 -0700
56186193c1 Replace mt19937 with new generator to enable parallel sampling Jan Wassenberg 2025-09-04 23:48:37 -0700
5d1693e806 Internal change Jan Wassenberg 2025-09-04 10:30:42 -0700
afd82376a5 Add AES-CTR RNG for parallel sampling (not yet used) Jan Wassenberg 2025-09-04 05:58:08 -0700
4be4799727 Remove kMaxPackages and per-package-related code Jan Wassenberg 2025-09-04 03:32:35 -0700
7263ab8445 MatMul simplification, threading strategy improvements Jan Wassenberg 2025-09-03 21:44:39 -0700
74ffe079c4 Create separate MMStorage objects per cluster. Marie White 2025-09-03 09:35:13 -0700
c783b82a82 Internal change Phil Culliton 2025-09-03 08:35:20 -0700
b7b3d353db Simplify MatMul: remove F32 special case (build time) Jan Wassenberg 2025-09-02 04:28:49 -0700
1e3c853e80 Add ParallelFor wrapper function and one new mode Jan Wassenberg 2025-09-02 01:39:28 -0700
3737224132 Add in-cluster parallel policy. Update policy to include cluster_idx. Marie White 2025-09-02 00:14:05 -0700
27cb8e12d9 Handle non-threading parallel policy. Marie White 2025-09-02 00:02:18 -0700
0d2e74d74a Add MMOptions as an argument to Matmul. Marie White 2025-09-01 23:46:07 -0700
229bd078a1 1.29x speedup: bf16 C1/C2. Extend most ops to any type, expand test coverage. Jan Wassenberg 2025-09-01 06:32:24 -0700
bc0c0bac8b Add non-threading parallel policy. Marie White 2025-08-29 08:38:19 -0700
00b70f69c5 Include parallelism type in DoMatMul. Also remove package handling. Marie White 2025-08-29 08:04:05 -0700
0ae8646731 Fix remainder handling for Paligemma Jan Wassenberg 2025-08-29 07:25:14 -0700
973e284ed6 Refactor Matmul to use a policy class for parallelization. Marie White 2025-08-29 05:40:06 -0700
6c39a2dea4 1.01x speedup: More bf16 activations to reduce DecompressA. Jan Wassenberg 2025-08-29 03:18:28 -0700
7288891439 Remove F64 partial storage in matmul. Jan Wassenberg 2025-08-29 00:11:31 -0700
31c09cca4c f32 LoopKC: 1.37x(M=512), 1.19(M=128) single-K F32,BF16 matmul speedup on SKX Jan Wassenberg 2025-08-28 08:55:15 -0700
98ddc166db Expand ThreadingContext comments Jan Wassenberg 2025-08-28 08:31:25 -0700
6128e758ff Change ffw_out from B16 to F32. Marie White 2025-08-28 00:01:01 -0700
85cc51795c Internal change. The gemma.cpp Authors 2025-08-26 08:07:23 -0700
5411fd846d Minor: batched NotifyGenerate, fix comment/dep Jan Wassenberg 2025-08-26 23:32:43 -0700
86afd53076 1.04x speedup: Parallelize SoftCap Jan Wassenberg 2025-08-26 11:54:48 -0700
ed2f0bd1b0 Fix pos assertions, refs #665 Jan Wassenberg 2025-08-26 04:50:06 -0700
9bf0fe4e37 Internal change Jan Wassenberg 2025-08-26 04:43:26 -0700
d3a5ddf657 Merge pull request #663 from junjihashimoto:feature/api-server Jan Wassenberg 2025-08-24 11:57:05 +0200
73f1140dca Fix an off-by-one error after StreamAndUpdateEOS() to remove the MSAN warning about reading an uninitialized variable in the kv_cache. Rhett Stucki 2025-08-20 22:59:24 -0700
41321611fd feature: add API server and client with Google protocol Junji Hashimoto 2025-08-20 11:05:09 +0900
41a86d41a9 Fix preadv error: only enable if we have a handle Jan Wassenberg 2025-08-15 06:30:07 -0700
78573b6718 Internal change. Add deduction for 270M. Phil Culliton 2025-08-14 08:04:10 -0700
d044801c1d Internal change Phil Culliton 2025-08-13 09:47:05 -0700
71406cf6d0 More profiler interface fixes: hwy:: plus avoid ADD_ZONE Jan Wassenberg 2025-08-13 03:15:07 -0700
faa4102992 (Resubmit) Prepare profiler annotations for new API Jan Wassenberg 2025-08-13 01:37:53 -0700
a2d9133f7d Prepare profiler annotations for new API The gemma.cpp Authors 2025-08-11 17:51:09 -0700
4cbf63e6f0 Prepare profiler annotations for new API Jan Wassenberg 2025-08-11 15:34:20 -0700
eef564e8f0 Prepare profiler annotations for new API Jan Wassenberg 2025-08-08 16:50:54 -0700
2e9c93a609 Merge pull request #649 from KaranocaVe:main Copybara-Service 2025-08-08 10:35:57 -0700
33fbac0880 Exporter updates/fixes Jan Wassenberg 2025-08-04 22:35:59 -0700
4e062d68f7 Update BlobWriter comments, WriteAll->Finalize Jan Wassenberg 2025-08-04 10:00:54 -0700
701841897b Default to disabling per-socket parallelization Jan Wassenberg 2025-08-04 09:48:22 -0700
b56b2f05e4 Automated Code Change Ivo Ristovski List 2025-08-01 13:29:16 -0700
eaf05cd04e

Merge 6dd1cd277f into 799c264df3 copybara-service[bot] 2025-08-01 20:11:15 +0000
6dd1cd277f Automated Code Change The gemma.cpp Authors 2025-07-11 05:32:57 -0700
799c264df3 Pre-tune thread pool before matmul Jan Wassenberg 2025-07-31 08:44:47 -0700
32286f0465

Merge branch 'dev' into main KaranocaVe 2025-07-31 22:40:56 +0800
50ee1a3e92 Write SBS progressively. Charles Zhao 2025-07-31 06:05:02 -0700
0ea118ebbe Update run.cc, CMakeLists and README for incompatible code, dependency changes and argument updates KaranocaVe 2025-07-31 00:59:16 +0800
8715eda512 Improved layer idx parsing Jan Wassenberg 2025-07-30 05:49:13 -0700
d831ddce5b Fix file mapping: was letting the smart pointer go out of scope Jan Wassenberg 2025-07-30 04:29:27 -0700
2141d4788d Add IsAppendOnly flag to file and if true, disable parallel writes Jan Wassenberg 2025-07-30 01:51:08 -0700
d22ba2ac96 Update layer index parsing and allow tokenizer override Jan Wassenberg 2025-07-30 01:21:54 -0700
d1638587f0 1.14x batch decode speedup: parallelize RMSNorm ops Jan Wassenberg 2025-07-30 00:54:55 -0700
ac0d751d20 Rename GetModelConfig->Config Jan Wassenberg 2025-07-29 10:17:14 -0700
33fabd4ed1 Internal change. Jeremiah Harmsen 2025-07-29 08:20:36 -0700
e76e29ce11 De-singleton ThreadingContext so callers can pass in their own Jan Wassenberg 2025-07-22 02:07:58 -0700
5474146129 Back to f32 kv_cache, but via typedef Jan Wassenberg 2025-07-21 07:04:55 -0700
56c9196eb6 Add blob_path to config deduction message Jan Wassenberg 2025-07-11 18:58:16 -0700
349c86f2d9 Fix bench_matmul perf regression: A input should be padded Jan Wassenberg 2025-07-11 07:35:52 -0700
4bc44d5678 Minor: ModelWeightsPtrs -> WeightsPtrs Jan Wassenberg 2025-07-11 06:10:51 -0700