llama.cpp

History

Andrea Arcangeli 990e4d9698 common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 ) * grammar: add test case for nullable symbol loop Reproduce stack overflow (or OOM) with ( [x]* )* found while adding GBNF support to ripgrep-edit. llama-server reproducer: curl \ -X POST \ -d '{ "messages": [{ "role": "user", "content": "write yes" }], "grammar": "root ::= ( [x]* )" }' \ -H "Content-Type: application/json" \ http://localhost:8811/v1/chat/completions grammar: prevent stack overflow with nullable symbol loop Fix a potential stack overflow in llama_grammar_advance_stack that could occur when processing grammars with nullable symbols that lead to infinite derivations of empty strings. The fix introduces cycle detection by tracking visited stacks to prevent infinite recursion. rg-edit regexp: llama_grammar_advance_stack rg-edit extra-args: -A20 rg-edit directive: """Rewrite: fix the following segfault: [..] ⚫ Testing segfault. Grammar: root ::= ( [x]* )* root ::= ( [x]* )* Segmentation fault build/bin/test-grammar-integration""" gptel-context: (("~/llama.cpp/src/llama-grammar.cpp") ("~/llama.cpp/tests/test-grammar-integration.cpp") ("~/llama.cpp/grammars/./list.gbnf") ("~/llama.cpp/grammars/./json_arr.gbnf") ("~/llama.cpp/grammars/./json.gbnf") ("~/llama.cpp/grammars/./japanese.gbnf") ("~/llama.cpp/grammars/./english.gbnf") ("~/llama.cpp/grammars/./chess.gbnf") ("~/llama.cpp/grammars/./c.gbnf") ("~/llama.cpp/grammars/./arithmetic.gbnf") ("~/llama.cpp/grammars/./README.md")) * grammar: convert recursive llama_grammar_advance_stack to iterative This change converts the function to an iterative approach using explicit stacks, which prevents deep recursion and eliminates the risk of stack overflow. rg-edit regexp: llama_grammar_advance_stack rg-edit extra-args: -A30 rg-edit directive: """Rewrite: fix the following segfault: [..] ⚫ Testing segfault. Grammar: root ::= ( [x]* )* root ::= ( [x]* )* Segmentation fault build/bin/test-grammar-integration convert from recursive to interactive""" gptel-context: (("~/llama.cpp/src/llama-grammar.cpp") ("~/llama.cpp/tests/test-grammar-integration.cpp") ("~/llama.cpp/grammars/./list.gbnf") ("~/llama.cpp/grammars/./json_arr.gbnf") ("~/llama.cpp/grammars/./json.gbnf") ("~/llama.cpp/grammars/./japanese.gbnf") ("~/llama.cpp/grammars/./english.gbnf") ("~/llama.cpp/grammars/./chess.gbnf") ("~/llama.cpp/grammars/./c.gbnf") ("~/llama.cpp/grammars/./arithmetic.gbnf") ("~/llama.cpp/grammars/./README.md")) v2: Added a `std::set` to perform tree-based lookups with O(N log N) complexity. Testing with a parallel run of `test-grammar-integration` shows a double-digit percentage increase in runtime. An `unordered_set` with O(1) hashing was also evaluated, but the overhead of constructing hash keys from pointers made it significantly slower than the rbtree implementation that only requires an ordering operator. The performance regression in the test suite appears justified by the overall reduction in algorithmic complexity. Co-developed-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com> * grammar: add test case for hang in repetition grammar processing This commit adds a new test case to the grammar integration tests that specifically targets a hang scenario in the repetition grammar parser found while adding GBNF support to ripgrep-edit. llama-server reproducer: curl \ -X POST \ -d '{ "messages": [{ "role": "user", "content": "write yes" }], "grammar": "root ::= (([^x]){0,99}){0,99}" }' \ -H "Content-Type: application/json" \ http://localhost:8811/v1/chat/completions grammar: add repetition threshold check The change introduces a maximum repetition threshold to avoid excessive rule expansion during grammar parsing. When parsing repetition patterns like {m,n}, the parser now calculates the potential number of rules that would be generated and throws an error if the product of previous rules and new rules exceeds the threshold. A test case was added to verify the threshold is properly enforced for deeply nested repetition patterns that would otherwise cause hangs.		2026-03-21 18:43:35 +01:00
..
peg-parser	common: consolidate PEG string parsers (#20263 )	2026-03-10 00:29:21 +01:00
.gitignore	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
CMakeLists.txt	test-backend-ops: allow loading tests from file and parsing model operators into file (#19896 )	2026-03-12 13:26:00 +01:00
export-graph-ops.cpp	test-backend-ops: allow loading tests from file and parsing model operators into file (#19896 )	2026-03-12 13:26:00 +01:00
get-model.cpp	ci : add model tests + script wrapper (#4586 )	2024-01-26 14:18:00 +02:00
get-model.h	ci : add model tests + script wrapper (#4586 )	2024-01-26 14:18:00 +02:00
gguf-model-data.cpp	tests : model metadata loading from huggingface (#19796 )	2026-02-28 10:44:38 +01:00
gguf-model-data.h	tests : model metadata loading from huggingface (#19796 )	2026-02-28 10:44:38 +01:00
run-json-schema-to-grammar.mjs	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
test-alloc.cpp	chore : correct typos [no ci] (#20041 )	2026-03-05 08:50:21 +01:00
test-arg-parser.cpp	ci, tests : use cmake to download models and remove libcurl dependency (#18791 )	2026-01-14 07:46:27 +01:00
test-autorelease.cpp	docs : Minor cleanups (#19252 )	2026-02-02 08:38:55 +02:00
test-backend-ops.cpp	metal : add FA specialization for HSK = 320, HSV = 256 (#20549 )	2026-03-14 23:15:47 +02:00
test-backend-sampler.cpp	tests: enable kv_unified to prevent cuda oom error on rtx 2060 (#20645 )	2026-03-18 17:40:22 +08:00
test-barrier.cpp	Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748 )	2025-12-10 12:32:23 -08:00
test-c.c	ggml : remove kompute backend (#14501 )	2025-07-03 07:48:32 +03:00
test-chat-auto-parser.cpp	common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825 )	2026-03-21 00:19:04 +01:00
test-chat-peg-parser.cpp	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
test-chat-template.cpp	Autoparser - complete refactoring of parser architecture (#18675 )	2026-03-06 21:01:00 +01:00
test-chat.cpp	common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825 )	2026-03-21 00:19:04 +01:00
test-double-float.cpp	ggml : minor naming changes (#8433 )	2024-07-12 10:46:02 +03:00
test-gbnf-validator.cpp	cmake : do not include ./src as public for libllama (#13062 )	2025-04-24 16:00:10 +03:00
test-gguf-model-data.cpp	tests : model metadata loading from huggingface (#19796 )	2026-02-28 10:44:38 +01:00
test-gguf.cpp	ggml/gguf : prevent integer overflows (#19856 )	2026-02-24 20:17:11 +02:00
test-grammar-integration.cpp	common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )	2026-03-21 18:43:35 +01:00
test-grammar-llguidance.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-grammar-parser.cpp	common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )	2026-03-21 18:43:35 +01:00
test-jinja.cpp	tests : fix test-jinja-py Windows failures by bypassing command-line args [no ci] (#20483 )	2026-03-18 10:43:31 +01:00
test-json-partial.cpp	common : handle unicode during partial json parsing (#16526 )	2025-10-12 16:18:47 +03:00
test-json-schema-to-grammar.cpp	examples : fix empty items in json_schema_to_grammar.py [no ci] (#19968 )	2026-03-10 14:38:18 +01:00
test-llama-archs.cpp	model: mistral small 4 support (#20649 )	2026-03-17 00:31:14 +01:00
test-llama-grammar.cpp	common/grammar: fix grammar parsing issues to prevent stack overflow and hangs (#18604 )	2026-03-21 18:43:35 +01:00
test-log.cpp	common : use common_ prefix for common library functions (#9805 )	2024-10-10 22:57:42 +02:00
test-lora-conversion-inference.sh	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
test-model-load-cancel.cpp	llama : update llama_model API names (#11063 )	2025-01-06 10:55:18 +02:00
test-mtmd-c-api.c	mtmd : add C public API (#13184 )	2025-05-04 23:43:42 +02:00
test-opt.cpp	tests : fix test-opt with GGML_BACKEND_DL (#15599 )	2025-08-26 22:14:38 +02:00
test-peg-parser.cpp	Autoparser - complete refactoring of parser architecture (#18675 )	2026-03-06 21:01:00 +01:00
test-quantize-fns.cpp	ggml : add NVFP4 quantization type support (#19769 )	2026-03-11 21:02:54 +01:00
test-quantize-perf.cpp	ci: run the x64 and arm ci on the github machines instead (#16183 )	2025-09-25 08:06:06 +03:00
test-quantize-stats.cpp	server: introduce API for serving / loading / unloading multiple models (#17470 )	2025-12-01 19:41:04 +01:00
test-reasoning-budget.cpp	common/parser: handle reasoning budget (#20297 )	2026-03-11 10:26:12 +01:00
test-regex-partial.cpp	common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342 )	2026-01-03 16:02:43 -06:00
test-rope.cpp	ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805 )	2025-11-11 13:33:24 +02:00
test-sampling.cpp	sampling : optimize samplers by reusing bucket sort (#15665 )	2025-08-31 20:41:02 +03:00
test-state-restore-fragmented.cpp	kv-cache: Fix state restore fragmented cache (#17982 )	2025-12-15 19:28:35 +02:00
test-thread-safety.cpp	server : support unified cache across slots (#16736 )	2025-11-02 18:14:04 +02:00
test-tokenizer-0.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-tokenizer-0.py	py : logging and flake8 suppression refactoring (#7081 )	2024-05-05 08:07:48 +03:00
test-tokenizer-0.sh	model : add Jina Embeddings v5 Nano (partial EuroBERT) support (#19826 )	2026-02-26 12:14:09 +01:00
test-tokenizer-1-bpe.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-tokenizer-1-spm.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-tokenizer-random.py	ci : switch from pyright to ty (#20826 )	2026-03-21 08:54:34 +01:00
test-tokenizers-repo.sh	devops: add s390x & ppc64le CI (#15925 )	2025-09-27 02:03:33 +08:00
testing.h	common : implement new jinja template engine (#18462 )	2026-01-16 11:22:06 +01:00