llama.cpp/common
Sascha Rogmann 72d3b1898a
spec : add self‑speculative decoding (no draft model required) + refactor (#18471)
* server: introduce self-speculative decoding

* server: moved self-call into speculative.cpp

* can_speculate() includes self-speculation

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server: can_speculate() tests self-spec

* server: replace can_speculate() with slot.can_speculate()

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* common: use %zu format specifier for size_t in logging

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* server: can_speculate() requires a task instance

* common: ngram map, config self-speculative decoding

* common: add enum common_speculative_type

* common: add vector of speculative states

* common: add option --spec-draftless

* server: cleanup (remove slot.batch_spec, rename)

* common: moved self-spec impl to ngram-map

* common: cleanup (use common_speculative_state_draft)

* spec : refactor

* cont : naming

* spec: remove --spec-config

* doc: (draftless) speculative decoding

* common: print performance in spec decoding

* minor : cleanup

* common : better names

* minor : cleanup + fix build

* minor: comments

* CODEOWNERS: add common/ngram-map.* (#18471)

* common : rename speculative.draftless_type -> speculative.type

* ngram-map : fix uninitialized values

* ngram-map : take into account the input can become shorter

* ngram-map : revert len check for now

* arg : change `--spec-draftless` -> `--spec-type`

* spec : add common_speculative_state::accept()

* spec : refactor + add common_speculative_begin()

* spec : fix begin() call with mtmd

* spec : additional refactor + remove common_speculative_params

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-28 19:42:42 +02:00
..
jinja jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests (#19147) 2026-01-28 14:40:29 +01:00
CMakeLists.txt spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
arg.cpp spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
arg.h vendor : update cpp-httplib to 0.30.0 (#18660) 2026-01-08 13:53:54 +01:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) 2025-06-13 10:38:52 +02:00
chat-parser-xml-toolcall.cpp Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser-xml-toolcall.h Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser.cpp server : support preserving reasoning_content in assistant message (#18994) 2026-01-22 21:30:06 +01:00
chat-parser.h cli : fix reasoning responses in CLI (#18961) 2026-01-20 18:23:25 +01:00
chat-peg-parser.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
chat-peg-parser.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
chat.cpp chat: fix language input for translategemma (#19052) 2026-01-24 17:58:45 +01:00
chat.h server : support preserving reasoning_content in assistant message (#18994) 2026-01-22 21:30:06 +01:00
common.cpp spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
common.h spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
console.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
console.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
debug.cpp Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914) 2026-01-14 20:29:35 +01:00
debug.h Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914) 2026-01-14 20:29:35 +01:00
download.cpp common, server : use the same User-Agent by default (#18957) 2026-01-20 18:28:43 +01:00
download.h preset: allow named remote preset (#18728) 2026-01-10 15:12:29 +01:00
http.h common : clarify HTTPS build options in error message (#19103) 2026-01-27 06:16:00 +01:00
json-partial.cpp common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
json-partial.h cli : fix reasoning responses in CLI (#18961) 2026-01-20 18:23:25 +01:00
json-schema-to-grammar.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
json-schema-to-grammar.h common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
llguidance.cpp sampling : add support for backend sampling (#17004) 2026-01-04 22:22:16 +02:00
log.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
log.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
ngram-cache.cpp spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
ngram-cache.h spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
ngram-map.cpp spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
ngram-map.h spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
peg-parser.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
peg-parser.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
preset.cpp preset: allow named remote preset (#18728) 2026-01-10 15:12:29 +01:00
preset.h common: support remote preset (#18520) 2026-01-08 22:35:40 +01:00
regex-partial.cpp common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342) 2026-01-03 16:02:43 -06:00
regex-partial.h `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
sampling.cpp llama : add adaptive-p sampler (#17927) 2026-01-15 19:16:29 +02:00
sampling.h sampling : add support for backend sampling (#17004) 2026-01-04 22:22:16 +02:00
speculative.cpp spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
speculative.h spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00
unicode.cpp common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
unicode.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00