llama.cpp/docs
Sascha Rogmann 72d3b1898a
spec : add self‑speculative decoding (no draft model required) + refactor (#18471)
* server: introduce self-speculative decoding

* server: moved self-call into speculative.cpp

* can_speculate() includes self-speculation

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server: can_speculate() tests self-spec

* server: replace can_speculate() with slot.can_speculate()

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* common: use %zu format specifier for size_t in logging

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* server: can_speculate() requires a task instance

* common: ngram map, config self-speculative decoding

* common: add enum common_speculative_type

* common: add vector of speculative states

* common: add option --spec-draftless

* server: cleanup (remove slot.batch_spec, rename)

* common: moved self-spec impl to ngram-map

* common: cleanup (use common_speculative_state_draft)

* spec : refactor

* cont : naming

* spec: remove --spec-config

* doc: (draftless) speculative decoding

* common: print performance in spec decoding

* minor : cleanup

* common : better names

* minor : cleanup + fix build

* minor: comments

* CODEOWNERS: add common/ngram-map.* (#18471)

* common : rename speculative.draftless_type -> speculative.type

* ngram-map : fix uninitialized values

* ngram-map : take into account the input can become shorter

* ngram-map : revert len check for now

* arg : change `--spec-draftless` -> `--spec-type`

* spec : add common_speculative_state::accept()

* spec : refactor + add common_speculative_begin()

* spec : fix begin() call with mtmd

* spec : additional refactor + remove common_speculative_params

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-28 19:42:42 +02:00
..
android android: fix missing screenshots for Android.md (#18156) 2025-12-19 09:32:04 +02:00
backend docs: add linux to index (#18907) 2026-01-18 18:03:35 +08:00
development docs : fix links in parsing.md (#18245) 2025-12-21 09:35:40 +01:00
multimodal model : support MiniCPM-V 4.5 (#15575) 2025-08-26 10:05:55 +02:00
ops ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
android.md android: fix missing screenshots for Android.md (#18156) 2025-12-19 09:32:04 +02:00
build-riscv64-spacemit.md refactor : remove libcurl, use OpenSSL when available (#18828) 2026-01-14 18:02:47 +01:00
build-s390x.md ggml-zdnn: fix #15414, activate FP16 and BF16 acceleration and incorrect zTensor free (#15839) 2025-09-13 02:39:52 +08:00
build.md doc: add build instruction to use Vulkan backend on macos (#19029) 2026-01-28 12:30:16 +01:00
docker.md CLI: fixed adding cli and completion into docker containers, improved docs (#18003) 2025-12-16 11:52:23 +01:00
function-calling.md common : implement new jinja template engine (#18462) 2026-01-16 11:22:06 +01:00
install.md docs : add "Quick start" section for new users (#13862) 2025-06-03 13:09:36 +02:00
llguidance.md llguidance build fixes for Windows (#11664) 2025-02-14 12:46:08 -08:00
multimodal.md mtmd : add support for Voxtral (#14862) 2025-07-28 15:01:48 +02:00
ops.md ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
preset.md preset: allow named remote preset (#18728) 2026-01-10 15:12:29 +01:00
speculative.md spec : add self‑speculative decoding (no draft model required) + refactor (#18471) 2026-01-28 19:42:42 +02:00