From e9ed5c4cb655f7fd2f3f0f21b13a7c0da464201c Mon Sep 17 00:00:00 2001 From: Yamini Nimmagadda Date: Tue, 13 Jan 2026 14:50:44 -0800 Subject: [PATCH] Update OPENVINO.md --- docs/backend/OPENVINO.md | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/docs/backend/OPENVINO.md b/docs/backend/OPENVINO.md index d69aaedf61..87c537f20b 100644 --- a/docs/backend/OPENVINO.md +++ b/docs/backend/OPENVINO.md @@ -13,20 +13,15 @@ The OpenVINO backend is implemented in ggml/src/ggml-openvino and provides a tra OpenVINO backend supports the following hardware: - Intel CPUs -- Intel integrated GPUs +- Intel integrated and discrete GPUs - Intel NPUs (Requires UD32+ driver) Although OpenVINO supports a wide range of [Intel hardware](https://docs.openvino.ai/2025/about-openvino/release-notes-openvino/system-requirements.html), the llama.cpp OpenVINO backend has been validated specifically on AI PCs such as the Intel® Core™ Ultra Series 1 and Series 2. ## Supported Model Precisions -### Fully Supported - -- FP16 GGUF -- BF16 GGUF - -### Quantized Models (Partial Support) - +- `FP16` +- `BF16` (on Intel Xeon) - `Q4_0` - `Q4_1` - `Q4_K_M` @@ -46,7 +41,7 @@ Accuracy and performance optimizations for quantized models are still work in pr - **Primary supported quantization scheme is `Q4_0`** - `Q6_K` tensors are requantized to `Q4_0_128` in general. For embedding weights, `Q6_K` tensors are requantized to `Q8_0_C` except for the token embedding matrix which is dequantized to fp16 -#### Additional Notes +### Additional Notes - Both `Q4_0` and `Q4_1` models use `Q6_K` for the token embedding tensor and the final matmul weight tensor (often the same tensor) - `Q4_0` models may produce some `Q4_1` tensors if an imatrix is provided during quantization using `llama-quantize`