Sascha Rogmann
|
72d3b1898a
|
spec : add self‑speculative decoding (no draft model required) + refactor (#18471)
* server: introduce self-speculative decoding
* server: moved self-call into speculative.cpp
* can_speculate() includes self-speculation
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server: can_speculate() tests self-spec
* server: replace can_speculate() with slot.can_speculate()
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* common: use %zu format specifier for size_t in logging
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* server: can_speculate() requires a task instance
* common: ngram map, config self-speculative decoding
* common: add enum common_speculative_type
* common: add vector of speculative states
* common: add option --spec-draftless
* server: cleanup (remove slot.batch_spec, rename)
* common: moved self-spec impl to ngram-map
* common: cleanup (use common_speculative_state_draft)
* spec : refactor
* cont : naming
* spec: remove --spec-config
* doc: (draftless) speculative decoding
* common: print performance in spec decoding
* minor : cleanup
* common : better names
* minor : cleanup + fix build
* minor: comments
* CODEOWNERS: add common/ngram-map.* (#18471)
* common : rename speculative.draftless_type -> speculative.type
* ngram-map : fix uninitialized values
* ngram-map : take into account the input can become shorter
* ngram-map : revert len check for now
* arg : change `--spec-draftless` -> `--spec-type`
* spec : add common_speculative_state::accept()
* spec : refactor + add common_speculative_begin()
* spec : fix begin() call with mtmd
* spec : additional refactor + remove common_speculative_params
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
2026-01-28 19:42:42 +02:00 |
Georgi Gerganov
|
727368c60f
|
llama : use LLAMA_TOKEN_NULL (#11062)
ggml-ci
|
2025-01-06 10:52:15 +02:00 |
Diego Devesa
|
7eee341bee
|
common : use common_ prefix for common library functions (#9805)
* common : use common_ prefix for common library functions
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
2024-10-10 22:57:42 +02:00 |
Johannes Gäßler
|
e02b597be3
|
lookup: fibonacci hashing, fix crashes (#8548)
|
2024-07-17 23:35:44 +02:00 |
Johannes Gäßler
|
50ccaf5eac
|
lookup: complement data from context with general text statistics (#5479)
* lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
* fixup! lookup: evaluation tools, use corpus/previous gens
|
2024-03-23 01:24:36 +01:00 |