diff --git a/docs/backend/OPENVINO.md b/docs/backend/OPENVINO.md
index d56c61d8a8..bc3a2c66cd 100644
--- a/docs/backend/OPENVINO.md
+++ b/docs/backend/OPENVINO.md
@@ -52,7 +52,7 @@ Accuracy and performance optimizations for quantized models are still work in pr
 
 - **Primary supported quantization scheme is `Q4_0`**
 - `Q4_0` and `Q4_1` tensors are requantized to int4 gs128 symmetric
-- `Q6_K` tensors are dequantized to FP16
+- `Q6_K` tensors are requentized to int8 except for the token embedding matrix
 
 #### Additional Notes
 
@@ -72,30 +72,17 @@ The following models have been validated for functionality on Intel® Core™ Ul
 - [openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)
 - [tencent/Hunyuan-7B-Instruct](https://huggingface.co/tencent/Hunyuan-7B-Instruct)
 - [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
+- [bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF](https://huggingface.co/bartowski/DeepSeek-R1-Distill-Llama-8B-GGUF)
 
 ## Build Instructions
 
-### Prerequisites
+For detailed build instructions, refer to [build.md](../build.md#openvino)
 
-- OpenVINO runtime and development packages
-- CMake
-- C++17-compatible compiler
-
-### Build Example
-
-```bash
-cmake -B build/ReleaseOV \
-  -DGGML_OPENVINO=ON \
-  -DCMAKE_BUILD_TYPE=Release
-
-cmake --build build/ReleaseOV -j
-```
-
-# Runtime Configuration
+## Runtime Configuration
 
 The OpenVINO backend can be configured using the following environment variables at runtime to control device selection, caching, debugging, and profiling behavior.
 
-## Configuration Options
+### Configuration Options
 
 | Variable | Description |
 |--------|-------------|
@@ -107,9 +94,9 @@ The OpenVINO backend can be configured using the following environment variables
 | `GGML_OPENVINO_DEBUG_INPUT` | Enable input debugging. |
 | `GGML_OPENVINO_DEBUG_OUTPUT` | Enable output debugging. |
 
-## Example Usage
+### Example Usage
 
-### GPU Inference with Profiling
+#### GPU Inference with Profiling
 
 ```bash
 export GGML_OPENVINO_CACHE_DIR=/tmp/ov_cache
@@ -122,7 +109,7 @@ export GGML_OPENVINO_DEVICE=GPU
   "The story of AI is "
 ```
 
-### llama-bench
+#### llama-bench
 
 ```bash
 GGML_OPENVINO_DEVICE=GPU ./llama-bench -fa 1
@@ -131,11 +118,16 @@ GGML_OPENVINO_DEVICE=GPU ./llama-bench -fa 1
 
 ### NPU Notes
 
-- Prompt processing is currently slower than CPU/GPU
 - Smaller context sizes are recommended (e.g. `-c 512`)
 - Static compilation mode is enabled automatically
 - Model caching is not yet supported
-  
+- Does not support llama-server -np > 1 (multiple parallel sequences)
+- Only supports llama-perplexity -b 512 or smaller
+
+## Llama.cpp Tools 
+
+The following tools work with the OpenVINO backend on CPU and GPU: llama-simple, llama-run, llama-cli, llama-server, llama-bench, llama-perplexity.
+
 ## Work in Progress
 
 - Performance and memory optimizations