llama.cpp

History

Sascha Rogmann 72d3b1898a spec : add self‑speculative decoding (no draft model required) + refactor (#18471 ) * server: introduce self-speculative decoding * server: moved self-call into speculative.cpp * can_speculate() includes self-speculation Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: can_speculate() tests self-spec * server: replace can_speculate() with slot.can_speculate() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * common: use %zu format specifier for size_t in logging Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * server: can_speculate() requires a task instance * common: ngram map, config self-speculative decoding * common: add enum common_speculative_type * common: add vector of speculative states * common: add option --spec-draftless * server: cleanup (remove slot.batch_spec, rename) * common: moved self-spec impl to ngram-map * common: cleanup (use common_speculative_state_draft) * spec : refactor * cont : naming * spec: remove --spec-config * doc: (draftless) speculative decoding * common: print performance in spec decoding * minor : cleanup * common : better names * minor : cleanup + fix build * minor: comments * CODEOWNERS: add common/ngram-map.* (#18471) * common : rename speculative.draftless_type -> speculative.type * ngram-map : fix uninitialized values * ngram-map : take into account the input can become shorter * ngram-map : revert len check for now * arg : change `--spec-draftless` -> `--spec-type` * spec : add common_speculative_state::accept() * spec : refactor + add common_speculative_begin() * spec : fix begin() call with mtmd * spec : additional refactor + remove common_speculative_params --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>		2026-01-28 19:42:42 +02:00
..
android	android: fix missing screenshots for Android.md (#18156 )	2025-12-19 09:32:04 +02:00
backend	docs: add linux to index (#18907 )	2026-01-18 18:03:35 +08:00
development	docs : fix links in parsing.md (#18245 )	2025-12-21 09:35:40 +01:00
multimodal	model : support MiniCPM-V 4.5 (#15575 )	2025-08-26 10:05:55 +02:00
ops	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
android.md	android: fix missing screenshots for Android.md (#18156 )	2025-12-19 09:32:04 +02:00
build-riscv64-spacemit.md	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
build-s390x.md	ggml-zdnn: fix #15414 , activate FP16 and BF16 acceleration and incorrect zTensor free (#15839 )	2025-09-13 02:39:52 +08:00
build.md	doc: add build instruction to use Vulkan backend on macos (#19029 )	2026-01-28 12:30:16 +01:00
docker.md	CLI: fixed adding cli and completion into docker containers, improved docs (#18003 )	2025-12-16 11:52:23 +01:00
function-calling.md	common : implement new jinja template engine (#18462 )	2026-01-16 11:22:06 +01:00
install.md	docs : add "Quick start" section for new users (#13862 )	2025-06-03 13:09:36 +02:00
llguidance.md	llguidance build fixes for Windows (#11664 )	2025-02-14 12:46:08 -08:00
multimodal.md	mtmd : add support for Voxtral (#14862 )	2025-07-28 15:01:48 +02:00
ops.md	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
preset.md	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
speculative.md	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 )	2026-01-28 19:42:42 +02:00