Commit Graph

15 Commits

Author SHA1 Message Date
samuel b4cbe030ac mtp-batch(chore): Fix logit flags for speculative sampling and remove debug logs 2025-10-11 18:37:40 -03:00
samuel 5e1d719bef mtp-batch (feat): Create and manage sinfo for MTP 2025-10-09 15:21:23 -03:00
samuel febd8235d2 mtp-batch (wip): fix how to warmup kv cache for MTP 2025-10-05 14:43:40 -03:00
samuel 67c6c069e0 mtp-batch (wip): Isolate MTP graph to prevent host embedding buffer corruption 2025-09-27 19:42:32 -03:00
samuel 75dc25e6fe mtp-batch (wip): organize batch for mtp cache 2025-09-27 17:17:00 -03:00
samuel 3da7e7f330 mtp-batch (fix): warm mtp cache for small batch size 2025-09-23 22:45:11 -03:00
samuel 07670a22c6 feat: implemented sampling for MTP 2025-09-03 13:25:21 -03:00
Aaron Lee 9fab53e438 fixed mtp kv cache update step in cases where prompt size > n_batch and n_ubatch 2025-09-02 17:14:09 -04:00
Aaron Lee 6870f9790c added proper KV cache management for MTP layers and slightly refactored 2025-08-17 04:59:36 -04:00
Aaron Lee 6e9bafc7a7 failed attempt to implement MTP; outputs tokens but KV cache management is unreasonable 2025-08-15 23:13:56 -04:00
Aaron Lee cf0f7c0448 broad thrust of the mtp implementation 2025-08-13 02:21:17 -04:00
g2mt 94933c8c2e
server : implement universal assisted decoding (#12635)
* llama-server : implement universal assisted decoding

* Erase prompt tail for kv-cache

* set vocab_dft_compatible in common_speculative

* rename ctx_main to ctx_tgt

* move vocab_dft_compatible to spec struct

* clear mem_dft, remove mem

* detokenize id_last for incompatible models

* update comment

* add --spec-replace flag

* accept special tokens when translating between draft/main models

* Escape spec-replace

* clamp draft result to size to params.n_draft

* fix comment

* clean up code

* restore old example

* log common_speculative_are_compatible in speculative example

* fix

* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update common/speculative.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-07-31 14:25:23 +02:00
Georgi Gerganov abd4d0bc4f
speculative : update default params (#11954)
* speculative : update default params

* speculative : do not discard the last drafted token
2025-02-19 13:29:42 +02:00
Maxim Evtush 7b891bdc86
fix: typos in documentation files (#11791)
* Update ggml.c

* Update arg.cpp

* Update speculative.h
2025-02-10 23:21:31 +01:00
Georgi Gerganov d9d54e498d
speculative : refactor and add a simpler example (#10362)
* speculative : refactor and add a simpler example

ggml-ci

* speculative : clean-up and add comments and TODOs [no ci]

* speculative : manage context in common_speculative

ggml-ci

* speculative : simplify

ggml-ci

* speculative : simplify (cont)

ggml-ci

* speculative : add --draft-min CLI arg

* speculative : minor fixup

* make : build fixes

* speculative : do not redraft previous drafts

ggml-ci

* speculative : fix the draft sampling

ggml-ci

* speculative : fix compile warning

* common : refactor args

ggml-ci

* common : change defaults [no ci]

* common : final touches

ggml-ci
2024-11-25 09:58:41 +02:00