From 9ba324726aa7b01d8e58336e3609aac055054103 Mon Sep 17 00:00:00 2001 From: Yamini Nimmagadda Date: Mon, 12 Jan 2026 17:29:46 -0800 Subject: [PATCH] Update OPENVINO.md --- docs/backend/OPENVINO.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/backend/OPENVINO.md b/docs/backend/OPENVINO.md index bc3a2c66cd..7c2e733b03 100644 --- a/docs/backend/OPENVINO.md +++ b/docs/backend/OPENVINO.md @@ -93,6 +93,9 @@ The OpenVINO backend can be configured using the following environment variables | `GGML_OPENVINO_DUMP_IR` | Export OpenVINO IR files with timestamps. | | `GGML_OPENVINO_DEBUG_INPUT` | Enable input debugging. | | `GGML_OPENVINO_DEBUG_OUTPUT` | Enable output debugging. | +| *`GGML_OPENVINO_STATEFUL_EXECUTION` | Enable stateful execution for better performance | + +*`GGML_OPENVINO_STATEFUL_EXECUTION` is an **Experimental** feature to allow stateful execution for managing the KV cache internally inside the OpenVINO model, improving performance on CPUs and GPUs. Stateful execution is not effective on NPUs, and not all models currently support this feature. This feature is experimental and has been validated only with the llama-simple, llama-cli, llama-bench, and llama-run applications and is recommended to enable for the best performance. Other applications, such as llama-server and llama-perplexity, are not yet supported. ### Example Usage