[SYCL] Update SYCL.md for binary package for Windows (#20401)

* add download binary package * update prefix
2026-03-11 22:21:22 +08:00 · 2026-03-11 22:21:22 +08:00 · ecac98ee53
parent 182acfe5c5
commit ecac98ee53
1 changed files with 27 additions and 17 deletions
--- a/docs/backend/SYCL.md
+++ b/docs/backend/SYCL.md
@ -382,17 +382,27 @@ use 1 SYCL GPUs: [0] with Max compute units:512
 ## Windows
-### I. Setup Environment
+### Install GPU driver
 1. Install GPU driver
 Intel GPU drivers instructions guide and download page can be found here: [Get Intel GPU Drivers](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/software/drivers.html).
-2. Install Visual Studio
+### Option 1: download the binary package directly
 Download the binary package for Windows from: https://github.com/ggml-org/llama.cpp/releases.
 Extract the package to local folder, run the llama tools directly. Refer to [Run the inference](#iii-run-the-inference-1).
 Note, the package includes the SYCL running time and all depended dll files, no need to install oneAPI package and activte them.
 ### Option 2: build locally from the source code.
 #### I. Setup environment
 1. Install Visual Studio
 If you already have a recent version of Microsoft Visual Studio, you can skip this step. Otherwise, please refer to the official download page for [Microsoft Visual Studio](https://visualstudio.microsoft.com/).
-3. Install Intel® oneAPI Base toolkit
+2. Install Intel® oneAPI Base toolkit
 SYCL backend depends on:
  - Intel® oneAPI DPC++/C++ compiler/running-time.
@ -443,25 +453,25 @@ Output (example):
 [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28044]
 ```
-4. Install build tools
+3. Install build tools
 a. Download & install cmake for Windows: https://cmake.org/download/ (CMake can also be installed from Visual Studio Installer)
 b. The new Visual Studio will install Ninja as default. (If not, please install it manually: https://ninja-build.org/)
-### II. Build llama.cpp
+#### II. Build llama.cpp
 You could download the release package for Windows directly, which including binary files and depended oneAPI dll files.
 Choose one of following methods to build from source code.
-#### 1. Script
+##### Option 1: Script
 ```sh
 .\examples\sycl\win-build-sycl.bat
 ```
-#### 2. CMake
+##### Option 2: CMake
 On the oneAPI command line window, step into the llama.cpp main directory and run the following:
@ -490,7 +500,7 @@ cmake --preset x64-windows-sycl-debug
 cmake --build build-x64-windows-sycl-debug -j --target llama-completion
 ```
-#### 3. Visual Studio
+##### Option 3: Visual Studio
 You have two options to use Visual Studio to build llama.cpp:
 - As CMake Project using CMake presets.
@ -500,7 +510,7 @@ You have two options to use Visual Studio to build llama.cpp:
 All following commands are executed in PowerShell.
-##### - Open as a CMake Project
+###### - Open as a CMake Project
 You can use Visual Studio to open the `llama.cpp` folder directly as a CMake project. Before compiling, select one of the SYCL CMake presets:
@ -515,7 +525,7 @@ You can use Visual Studio to open the `llama.cpp` folder directly as a CMake pro
    cmake --build build --config Release -j --target llama-completion
    ```
-##### - Generating a Visual Studio Solution
+###### - Generating a Visual Studio Solution
 You can use Visual Studio solution to build and work on llama.cpp on Windows. You need to convert the CMake Project into a `.sln` file.
@ -603,7 +613,7 @@ found 2 SYCL devices:
 ```
-#### Choose level-zero devices
+##### Choose level-zero devices
 |Chosen Device ID|Setting|
 |-|-|
@ -611,7 +621,7 @@ found 2 SYCL devices:
 |1|`set ONEAPI_DEVICE_SELECTOR="level_zero:1"`|
 |0 & 1|`set ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"` or `set ONEAPI_DEVICE_SELECTOR="level_zero:*"`|
-#### Execute
+##### Execute
 Choose one of following methods to run.
@ -669,7 +679,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512
 ## Environment Variable
-#### Build
+### Build
 | Name               | Value                                 | Function                                    |
 |--------------------|---------------------------------------|---------------------------------------------|
@ -684,7 +694,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512
 1. FP32 or FP16 have different performance impact to LLM. Recommended to test them for better prompt processing performance on your models. You need to rebuild the code after change `GGML_SYCL_F16=OFF/ON`.
-#### Runtime
+### Runtime
 | Name              | Value            | Function                                                                                                                  |
 |-------------------|------------------|---------------------------------------------------------------------------------------------------------------------------|
@ -777,7 +787,7 @@ use 1 SYCL GPUs: [0] with Max compute units:512
  ```
 ### **GitHub contribution**:
-Please add the `SYCL :` prefix/tag in issues/PRs titles to help the SYCL contributors to check/address them without delay.
+Please add the `[SYCL]` prefix/tag in issues/PRs titles to help the SYCL contributors to check/address them without delay.
 ## TODO