From f44c60e995fda6b817dc29756a00bb9c4703a188 Mon Sep 17 00:00:00 2001
From: Yamini Nimmagadda <yamini.nimmagadda@intel.com>
Date: Tue, 13 Jan 2026 14:33:16 -0800
Subject: [PATCH] Update OPENVINO.md

---
 docs/backend/OPENVINO.md | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/docs/backend/OPENVINO.md b/docs/backend/OPENVINO.md
index 3395b70e60..d69aaedf61 100644
--- a/docs/backend/OPENVINO.md
+++ b/docs/backend/OPENVINO.md
@@ -36,23 +36,15 @@ Accuracy and performance optimizations for quantized models are still work in pr
 
 ## Quantization Support Details
 
-### CPU
+### CPU and GPU
 
 - **`Q4_0`, `Q4_1`, `Q4_K_M`, `Q6_K` models are supported**
-- `Q6_K` tensors (6-bit, gs16 symmetric) are converted to int8 gs16 symmetric
-- `Q5_K` tensors (5-bit, gs32 asymmetric) are converted to int8 gs32 asymmetric
-
-### GPU
-
-- **`Q4_0`, `Q4_1`, `Q4_K_M`, `Q6_K` models are supported**
-- `Q6_K` tensors (6-bit, gs16 symmetric) are requantized to int8 gs32 symmetric
-- `Q5_K` tensors (5-bit, gs32 asymmetric) are converted to int8 gs32 asymmetric
+- `Q5_K` and `Q6_K` tensors are converted to `Q8_0_C`
 
 ### NPU
 
 - **Primary supported quantization scheme is `Q4_0`**
-- `Q4_0` and `Q4_1` tensors are requantized to int4 gs128 symmetric
-- `Q6_K` tensors are requentized to int8 except for the token embedding matrix which is dequantized to fp16
+- `Q6_K` tensors are requantized to `Q4_0_128` in general. For embedding weights, `Q6_K` tensors are requantized to `Q8_0_C` except for the token embedding matrix which is dequantized to fp16
 
 #### Additional Notes