From dc6592431b909208040c1a8e953e6c5440471eaa Mon Sep 17 00:00:00 2001
From: Ruikai Peng <retr0@retr0.blog>
Date: Fri, 20 Mar 2026 17:31:34 +0800
Subject: [PATCH] context: zero output buffer on allocation (#20781)

* context: zero output buffer on allocation

Address GHSA-wqq9-25mr-rw76.

The logits output buffer allocated in output_reserve() uses
posix_memalign(), which does not zero memory. The buffer is only
written during decode when needs_raw_logits() returns true. When
backend samplers cover all output sequences, needs_raw_logits()
returns false and the buffer is never written, but
llama_get_logits() still returns a pointer to it, exposing stale
heap content.

Zero the buffer after allocation to prevent information disclosure
through the public logits API.

Found-by: Pwno

* Update src/llama-context.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---
 src/llama-context.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/llama-context.cpp b/src/llama-context.cpp
index dc61afb0bd..8f25d47786 100644
--- a/src/llama-context.cpp
+++ b/src/llama-context.cpp
@@ -1946,6 +1946,7 @@ uint32_t llama_context::output_reserve(int32_t n_outputs) {
             LLAMA_LOG_ERROR("%s: failed to allocate output buffer of size %.2f MiB\n", __func__, new_size / (1024.0 * 1024.0));
             return 0;
         }
+        ggml_backend_buffer_clear(buf_output.get(), 0);
     }
 
     float * output_base = (float *) ggml_backend_buffer_get_base(buf_output.get());