From 86761dc113c28b2c40b20877b98adc63adcb9cda Mon Sep 17 00:00:00 2001 From: Omar Sanseviero Date: Fri, 1 Mar 2024 23:44:38 +0100 Subject: [PATCH 1/3] Update README.md --- README.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 331d96f..2c7b5a7 100644 --- a/README.md +++ b/README.md @@ -65,15 +65,25 @@ winget install --id Kitware.CMake winget install --id Microsoft.VisualStudio.2022.BuildTools --force --override "--passive --wait --add Microsoft.VisualStudio.Workload.VCTools;installRecommended --add Microsoft.VisualStudio.Component.VC.Llvm.Clang --add Microsoft.VisualStudio.Component.VC.Llvm.ClangToolset" ``` -### Step 1: Obtain model weights and tokenizer from Kaggle +### Step 1: Obtain model weights and tokenizer from Kaggle or Hugging Face Hub Visit [the Gemma model page on -Kaggle](https://www.kaggle.com/models/google/gemma) and select `Model Variations +Kaggle](https://www.kaggle.com/models/google/gemma/frameworks/gemmaCpp) and select `Model Variations |> Gemma C++`. On this tab, the `Variation` dropdown includes the options below. Note bfloat16 weights are higher fidelity, while 8-bit switched floating point weights enable faster inference. In general, we recommend starting with the `-sfp` checkpoints. +Alternatively, visit the [gemma.cpp](https://huggingface.co/models?other=gemma.cpp) +models on the Hugging Face Hub. First go the the model repository of the model of interest +(see recommendations below). Then, click the `Files and versions` tab and download the +model and tokenizer files. For programmatic downloading, if you have `huggingface_hub` +installed, you can also run: + +``` +huggingface-cli download google/gemma-2b-cpp --local-dir build/ +``` + 2B instruction-tuned (`it`) and pre-trained (`pt`) models: | Model name | Description | @@ -96,7 +106,7 @@ weights enable faster inference. In general, we recommend starting with the > **Important**: We strongly recommend starting off with the `2b-it-sfp` model to > get up and running. -### Step 2: Extract Files +### Step 2: Extract Files (if downloading from Kaggle) After filling out the consent form, the download should proceed to retrieve a tar archive file `archive.tar.gz`. Extract files from `archive.tar.gz` (this can From 8c857b957e21a3143d9a3c5eca5e3bff28c8a151 Mon Sep 17 00:00:00 2001 From: Omar Sanseviero Date: Mon, 4 Mar 2024 12:58:49 +0100 Subject: [PATCH 2/3] Update README.md --- README.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 2c7b5a7..4dc4a91 100644 --- a/README.md +++ b/README.md @@ -78,11 +78,12 @@ Alternatively, visit the [gemma.cpp](https://huggingface.co/models?other=gemma.c models on the Hugging Face Hub. First go the the model repository of the model of interest (see recommendations below). Then, click the `Files and versions` tab and download the model and tokenizer files. For programmatic downloading, if you have `huggingface_hub` -installed, you can also run: +installed, you can also download by running: ``` -huggingface-cli download google/gemma-2b-cpp --local-dir build/ -``` +huggingface-cli login # Just the first time +huggingface-cli download google/gemma-2b-sfp-cpp --local-dir build/ +``` 2B instruction-tuned (`it`) and pre-trained (`pt`) models: @@ -106,7 +107,9 @@ huggingface-cli download google/gemma-2b-cpp --local-dir build/ > **Important**: We strongly recommend starting off with the `2b-it-sfp` model to > get up and running. -### Step 2: Extract Files (if downloading from Kaggle) +### Step 2: Extract Files + +If you downloaded the models from Hugging Face, skip to step 3. After filling out the consent form, the download should proceed to retrieve a tar archive file `archive.tar.gz`. Extract files from `archive.tar.gz` (this can From 3cdd5e524a4c74de49e936cb5d9580d56aacfe07 Mon Sep 17 00:00:00 2001 From: Jan Wassenberg Date: Tue, 5 Mar 2024 23:00:09 -0800 Subject: [PATCH 3/3] Fix loop iteration in GeluMulToBF16 Also attempt to speed up builders (parallel) PiperOrigin-RevId: 613092863 --- .github/workflows/build.yml | 2 +- ops.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 06f4dfa..a0e9dc2 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -44,7 +44,7 @@ jobs: -D CMAKE_CXX_COMPILER_LAUNCHER=ccache - name: Build - run: cmake --build ${{ github.workspace }}/build --preset ${{ matrix.preset }} --config ${{ matrix.build_type }} + run: cmake --build ${{ github.workspace }}/build --preset ${{ matrix.preset }} --config ${{ matrix.build_type }} -j 4 - name: Archive production artifacts uses: actions/upload-artifact@v4 diff --git a/ops.h b/ops.h index 8f92d82..3725776 100644 --- a/ops.h +++ b/ops.h @@ -241,7 +241,7 @@ static HWY_NOINLINE HWY_MAYBE_UNUSED void GeluMulToBF16( size_t i = 0; if (size >= 2 * NF) { - for (; i < size - 2 * NF; i += 2 * NF) { + for (; i <= size - 2 * NF; i += 2 * NF) { const VF mul0 = hn::LoadU(df, mul + i); const VF mul1 = hn::LoadU(df, mul + i + NF); const VF g0 = hn::Mul(mul0, Gelu(df, hn::LoadU(df, gelu_in + i)));