Merge 8bcd53b74e into 25224c8021

Replace unsafe strnlen() with a bounds-checked loop that scans for \0 within the remaining array size.
fix: OOB reads in UGM tokenizer (precompiled_charsmap handling)
2026-02-13 00:04:41 -08:00 · 2026-01-12 15:53:05 +08:00 · 2026-01-11 16:36:49 +08:00
1 changed files with 12 additions and 1 deletions
--- a/src/llama-vocab.cpp
+++ b/src/llama-vocab.cpp
@ -797,6 +797,9 @@ struct llm_tokenizer_ugm : llm_tokenizer {

            // First four bytes of precompiled_charsmap contains length of binary
            // blob containing XOR-compressed compact double array (XCDA) entries
+            if (precompiled_charsmap.size() < sizeof(uint32_t)) {
+                throw std::runtime_error("precompiled_charsmap too small for xcda_blob_size header!");
+            }
            uint32_t xcda_blob_size = *(const uint32_t *) &precompiled_charsmap[0];
            charsmap_offset += sizeof(xcda_blob_size);
            if (xcda_blob_size + charsmap_offset >= precompiled_charsmap.size()) {
@ -1117,7 +1120,15 @@ private:
                throw std::runtime_error("Index out of array bounds in precompiled charsmap!");
            }
            const char * prefix_replacement = &(tokenizer.prefix_replacements)[longest_prefix_offset];
-            return { prefix_replacement, strlen(prefix_replacement), longest_prefix_length };
+            size_t max_len = tokenizer.prefix_replacements_size - longest_prefix_offset;
+            size_t repl_len = 0;
+            while (repl_len < max_len && prefix_replacement[repl_len] != '\0') {
+                repl_len++;
+            }
+            if (repl_len == max_len) {
+                throw std::runtime_error("Unterminated string in precompiled charsmap!");
+            }
+            return { prefix_replacement, repl_len, longest_prefix_length };
        }

        // check if the input prefix contains a valid sequence of UTF-8 code units
Author	SHA1	Message	Date
hourhl	fe5c3ad3ce	Merge `8bcd53b74e` into `25224c8021`	2026-02-13 00:04:41 -08:00
hourhl	8bcd53b74e	Replace unsafe strnlen() with a bounds-checked loop that scans for \0 within the remaining array size.	2026-01-12 15:53:05 +08:00
hourhl	0c0a0dcc88	fix: OOB reads in UGM tokenizer (precompiled_charsmap handling) - Validate minimum size (4 bytes) before reading xcda_blob_size - Use strnlen with bounds check instead of unsafe strlen Both issues allow heap-buffer-overflow from malicious T5/UGM GGUF files.	2026-01-11 16:36:49 +08:00