llama.cpp

History

Daniel Bevenius 1138d5c2d9 sampling : support multiple outputs per sequence This commit adds support for multiple outputs per sequence in the backend sampling implementation. The main motivation for this change is to be able to support speculative decoding using backend samplers where multiple outputs for the same sequence would be needed.		2026-02-27 05:40:49 +01:00
..
peg-parser	common : implement new jinja template engine (#18462 )	2026-01-16 11:22:06 +01:00
.gitignore	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
CMakeLists.txt	tests : enable test-chat out of tree build (#19558 )	2026-02-27 05:37:54 +01:00
get-model.cpp	ci : add model tests + script wrapper (#4586 )	2024-01-26 14:18:00 +02:00
get-model.h	ci : add model tests + script wrapper (#4586 )	2024-01-26 14:18:00 +02:00
run-json-schema-to-grammar.mjs	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
test-alloc.cpp	ggml : fix graph reallocation with multiple chunks (#16396 )	2025-10-03 13:49:08 +02:00
test-arg-parser.cpp	ci, tests : use cmake to download models and remove libcurl dependency (#18791 )	2026-01-14 07:46:27 +01:00
test-autorelease.cpp	docs : Minor cleanups (#19252 )	2026-02-02 08:38:55 +02:00
test-backend-ops.cpp	test: mul_mat tests with huge batch size (#19519 )	2026-02-19 20:08:25 -06:00
test-backend-sampler.cpp	sampling : support multiple outputs per sequence	2026-02-27 05:40:49 +01:00
test-barrier.cpp	Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (#17748 )	2025-12-10 12:32:23 -08:00
test-c.c	ggml : remove kompute backend (#14501 )	2025-07-03 07:48:32 +03:00
test-chat-parser.cpp	cli : fix reasoning responses in CLI (#18961 )	2026-01-20 18:23:25 +01:00
test-chat-peg-parser.cpp	cli : fix reasoning responses in CLI (#18961 )	2026-01-20 18:23:25 +01:00
test-chat-template.cpp	jinja : do not pass empty tools and add some none filters (#19176 )	2026-01-29 14:06:54 +01:00
test-chat.cpp	common : merge qwen3-coder and nemotron nano 3 parsers (#19765 )	2026-02-20 23:22:22 +01:00
test-double-float.cpp	ggml : minor naming changes (#8433 )	2024-07-12 10:46:02 +03:00
test-gbnf-validator.cpp	cmake : do not include ./src as public for libllama (#13062 )	2025-04-24 16:00:10 +03:00
test-gguf.cpp	ggml/gguf : prevent integer overflows (#19856 )	2026-02-24 20:17:11 +02:00
test-grammar-integration.cpp	llama : add token matching support to llama-grammar (#17816 )	2025-12-09 00:32:57 -06:00
test-grammar-llguidance.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-grammar-parser.cpp	llama : add token matching support to llama-grammar (#17816 )	2025-12-09 00:32:57 -06:00
test-jinja.cpp	jinja: correct stats for tojson and string filters (#19785 )	2026-02-22 21:08:23 +01:00
test-json-partial.cpp	common : handle unicode during partial json parsing (#16526 )	2025-10-12 16:18:47 +03:00
test-json-schema-to-grammar.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
test-llama-grammar.cpp	llama : add token matching support to llama-grammar (#17816 )	2025-12-09 00:32:57 -06:00
test-log.cpp	common : use common_ prefix for common library functions (#9805 )	2024-10-10 22:57:42 +02:00
test-lora-conversion-inference.sh	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
test-model-load-cancel.cpp	llama : update llama_model API names (#11063 )	2025-01-06 10:55:18 +02:00
test-mtmd-c-api.c	mtmd : add C public API (#13184 )	2025-05-04 23:43:42 +02:00
test-opt.cpp	tests : fix test-opt with GGML_BACKEND_DL (#15599 )	2025-08-26 22:14:38 +02:00
test-peg-parser.cpp	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
test-quantize-fns.cpp	tests : fix test-quantize-fns to init the CPU backend (#12306 )	2025-03-10 14:07:15 +02:00
test-quantize-perf.cpp	ci: run the x64 and arm ci on the github machines instead (#16183 )	2025-09-25 08:06:06 +03:00
test-quantize-stats.cpp	server: introduce API for serving / loading / unloading multiple models (#17470 )	2025-12-01 19:41:04 +01:00
test-regex-partial.cpp	common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342 )	2026-01-03 16:02:43 -06:00
test-rope.cpp	ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805 )	2025-11-11 13:33:24 +02:00
test-sampling.cpp	sampling : optimize samplers by reusing bucket sort (#15665 )	2025-08-31 20:41:02 +03:00
test-state-restore-fragmented.cpp	kv-cache: Fix state restore fragmented cache (#17982 )	2025-12-15 19:28:35 +02:00
test-thread-safety.cpp	server : support unified cache across slots (#16736 )	2025-11-02 18:14:04 +02:00
test-tokenizer-0.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-tokenizer-0.py	py : logging and flake8 suppression refactoring (#7081 )	2024-05-05 08:07:48 +03:00
test-tokenizer-0.sh	model : add Jina Embeddings v5 Nano (partial EuroBERT) support (#19826 )	2026-02-26 12:14:09 +01:00
test-tokenizer-1-bpe.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-tokenizer-1-spm.cpp	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
test-tokenizer-random.py	requirements : update transformers/torch for Embedding Gemma (#15828 )	2025-09-09 06:06:52 +02:00
test-tokenizers-repo.sh	devops: add s390x & ppc64le CI (#15925 )	2025-09-27 02:03:33 +08:00
testing.h	common : implement new jinja template engine (#18462 )	2026-01-16 11:22:06 +01:00