commit 912ed2cd9339d1b2875d98744ca5b51fa62e581e
Author: samuel <samueloliveira32df@gmail.com>
Date: Sun Dec 7 23:00:29 2025 -0300
speculative (feat): implement recursive MTP drafting for GLM-4.5
commit bdf72d9552e3da64ffc85f175664713388752914
Author: samuel <samueloliveira32df@gmail.com>
Date: Sat Dec 6 16:10:16 2025 -0300
sampling (feat): optimize speculative drafting with fast-path selection
commit a91980a8f3475a6bbac0a64d8be06dd4b613020e
Author: samuel <samueloliveira32df@gmail.com>
Date: Sat Dec 6 15:18:19 2025 -0300
mtp (chore): clean old code
commit 6de0ecf55db8567db4faa99b0152b72c9e854548
Author: samuel <samueloliveira32df@gmail.com>
Date: Sat Dec 6 14:40:13 2025 -0300
mtp (feat): add mtp arg
commit ea77394183b8e6c368af969b8274039a54b11486
Author: samuel <samueloliveira32df@gmail.com>
Date: Sat Dec 6 13:47:54 2025 -0300
mtp-graph (fix): move llama_get_logits_ith outside the loop
commit 15dff208958fb66802f20ec53ce5fcaff133edb7
Merge: 171346c74 cae85fe53
Author: samuel <samueloliveira32df@gmail.com>
Date: Thu Oct 16 13:44:41 2025 -0300
Merge branch 'glm4-mtp-batch' of https://github.com/SamuelOliveirads/llama.cpp into glm4-mtp-graph-cache
commit cae85fe531876762ee02524fc4c3f6c5e7824c63
Author: samuel <samueloliveira32df@gmail.com>
Date: Thu Oct 16 13:42:31 2025 -0300
mtp-batch(fix): avoid logits for mtp kv cache operations
commit 171346c742c310bbcfbd786b61250638ccf8b44d
Author: samuel <samueloliveira32df@gmail.com>
Date: Sun Oct 12 16:33:01 2025 -0300
mtp-graph(feat): Reactivate graph reuse only for main model path
commit 0127c6beeb384ec3abbc18b22dbe830f22fcf4b4
Author: samuel <samueloliveira32df@gmail.com>
Date: Sat Oct 11 22:20:54 2025 -0300
mtp-batch(chore): Remove final MTP debug logs and dead code
commit 4bcc9e261ef57ee4cfaa65d06bcd0fcdeacf7797
Author: samuel <samueloliveira32df@gmail.com>
Date: Sat Oct 11 18:51:22 2025 -0300
mtp-batch(fix): Correctly advance cache head and add MTP documentation
commit b4cbe030ac25056717763b812d1dd89681c08522
Author: samuel <samueloliveira32df@gmail.com>
Date: Sat Oct 11 18:37:40 2025 -0300
mtp-batch(chore): Fix logit flags for speculative sampling and remove debug logs
commit a99709d0c1401d0b447dce1bd0101fb56390f50e
Author: samuel <samueloliveira32df@gmail.com>
Date: Fri Oct 10 17:24:34 2025 -0300
mtp-batch(refactor): Extract decode context and MTP input logic into helper methods
commit 913af8f48d2dab1d9e907cf6c48c921a229a295c
Author: samuel <samueloliveira32df@gmail.com>
Date: Fri Oct 10 16:44:28 2025 -0300
mtp-batch(refactor): Replace MTP boolean flags with an explicit operation enum
commit 6f74ba38070d62d37bc0fb71ce9871e1a4ffabcc
Author: samuel <samueloliveira32df@gmail.com>
Date: Thu Oct 9 22:27:18 2025 -0300
mtp-batch (fix): prevent mtp draft from polluting the cache
commit 5e1d719beffccf8c22784c24b52ff6f5ab56b9ff
Author: samuel <samueloliveira32df@gmail.com>
Date: Thu Oct 9 15:21:23 2025 -0300
mtp-batch (feat): Create and manage sinfo for MTP
commit febd8235d27fe9174ee4b54ea7a10e630939fee0
Author: samuel <samueloliveira32df@gmail.com>
Date: Sun Oct 5 14:43:40 2025 -0300
mtp-batch (wip): fix how to warmup kv cache for MTP
commit 67c6c069e0a5496adfd7d8aa6ca7514db5a6f437
Author: samuel <samueloliveira32df@gmail.com>
Date: Sat Sep 27 19:42:32 2025 -0300
mtp-batch (wip): Isolate MTP graph to prevent host embedding buffer corruption
commit 75dc25e6fe781c1b65038d69390fb778d760e3a1
Author: samuel <samueloliveira32df@gmail.com>
Date: Sat Sep 27 17:17:00 2025 -0300
mtp-batch (wip): organize batch for mtp cache
commit 3da7e7f3309dbb576538850c92c1cbf8fdc6d6ee
Author: samuel <samueloliveira32df@gmail.com>
Date: Tue Sep 23 22:45:11 2025 -0300
mtp-batch (fix): warm mtp cache for small batch size
commit df64508b937784112168aa099644b60fef015f05
Author: samuel <samueloliveira32df@gmail.com>
Date: Sun Sep 21 21:55:41 2025 -0300
mtp-batch (wip): merge glm graphs
commit 042eb8a829876ed175320df9c8133bcea0c40460
Author: samuel <samueloliveira32df@gmail.com>
Date: Sun Sep 21 21:29:00 2025 -0300
mtp-batch (wip): merge mtp and model graph
commit 1318b2de82716710b9853e07bd640443a5a025bb
Author: samuel <samueloliveira32df@gmail.com>
Date: Sun Sep 14 10:22:59 2025 -0300
mtp-batch (wip): move mtp execution to batch format
commit c6237c71ffd4485df1c35829c380b63e472fc5dd
Merge: 9fab53e43 8742ce0e3
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Sat Sep 13 02:57:01 2025 -0400
Merge pull request #1 from SamuelOliveirads/glm4-moe-mtp
feat: implemented sampling for MTP
commit 8742ce0e39823eeb101bb5b6099ff4ca7be10c6e
Author: samuel <samueloliveira32df@gmail.com>
Date: Sat Sep 6 00:21:18 2025 -0300
feat: apply logits + greedy sampler
commit 5a5bce85777041d841393b4396e28f8e3065bb10
Author: samuel <samueloliveira32df@gmail.com>
Date: Wed Sep 3 17:56:14 2025 -0300
fix: add sample acceptance
commit 07670a22c63b1fa335d6ec1c4a1e4255a920848c
Author: samuel <samueloliveira32df@gmail.com>
Date: Wed Sep 3 13:25:21 2025 -0300
feat: implemented sampling for MTP
commit 9fab53e4388c20aef497efd82e86dcb99ca58064
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Tue Sep 2 17:14:09 2025 -0400
fixed mtp kv cache update step in cases where prompt size > n_batch and n_ubatch
commit 98bc0c6bf223f425f4ecea14f13fc46101f1b44a
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Tue Aug 26 01:26:51 2025 -0400
replace standard sampler with greedy sampler for mtp draft
commit 471e026327cca9f6f58aeefe32129a6cb9390f4f
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Tue Aug 19 23:10:56 2025 -0400
fixed vram leak
commit d72f9d5691054958cd1b139f228e5e588d3974cf
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Tue Aug 19 01:50:34 2025 -0400
kludge-y kv cache management of mtp layer
commit 382135aa3619294ab8bf87b0de4b1255ab7942f0
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Sun Aug 17 21:54:45 2025 -0400
fixed mtp kv cache update sequencing after prompt processing
commit 6870f9790c1bb1d0254241267b1a6c8a7fc82830
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Sun Aug 17 04:59:36 2025 -0400
added proper KV cache management for MTP layers and slightly refactored
commit 6e9bafc7a738b4c99f9440c0ec461e08cf6ce702
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Fri Aug 15 23:13:56 2025 -0400
failed attempt to implement MTP; outputs tokens but KV cache management is unreasonable
commit cf0f7c0448c2c1736588673114558e5829db7879
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Wed Aug 13 02:21:17 2025 -0400
broad thrust of the mtp implementation
commit 03231da69eec20677e25e2307d4fe31ac2ede034
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Tue Aug 12 01:03:59 2025 -0400
add model member function to build mtp graph, to be called from speculative.cpp
commit 1f477b375504aa557ed21066aa6783b11781a179
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Mon Aug 11 20:54:45 2025 -0400
make nextn weights loadable without a crash
commit e434f87cc739a1901931d88e33f777170a4e18e7
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Mon Aug 11 01:21:47 2025 -0400
some work towards building mtp layer graph
commit db60623e7926fb151b3cc63f029929122cac342a
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date: Sun Aug 10 23:52:54 2025 -0400
added getter for nextn layer count and server slot has_mtp property
* llama-server : implement universal assisted decoding
* Erase prompt tail for kv-cache
* set vocab_dft_compatible in common_speculative
* rename ctx_main to ctx_tgt
* move vocab_dft_compatible to spec struct
* clear mem_dft, remove mem
* detokenize id_last for incompatible models
* update comment
* add --spec-replace flag
* accept special tokens when translating between draft/main models
* Escape spec-replace
* clamp draft result to size to params.n_draft
* fix comment
* clean up code
* restore old example
* log common_speculative_are_compatible in speculative example
* fix
* Update common/speculative.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update common/speculative.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update common/speculative.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>