llama.cpp

Commit Graph

Author	SHA1	Message	Date
samuel	fe2baf5e2d	Squashed commit of the following: commit 912ed2cd9339d1b2875d98744ca5b51fa62e581e Author: samuel <samueloliveira32df@gmail.com> Date: Sun Dec 7 23:00:29 2025 -0300 speculative (feat): implement recursive MTP drafting for GLM-4.5 commit bdf72d9552e3da64ffc85f175664713388752914 Author: samuel <samueloliveira32df@gmail.com> Date: Sat Dec 6 16:10:16 2025 -0300 sampling (feat): optimize speculative drafting with fast-path selection commit a91980a8f3475a6bbac0a64d8be06dd4b613020e Author: samuel <samueloliveira32df@gmail.com> Date: Sat Dec 6 15:18:19 2025 -0300 mtp (chore): clean old code commit 6de0ecf55db8567db4faa99b0152b72c9e854548 Author: samuel <samueloliveira32df@gmail.com> Date: Sat Dec 6 14:40:13 2025 -0300 mtp (feat): add mtp arg commit ea77394183b8e6c368af969b8274039a54b11486 Author: samuel <samueloliveira32df@gmail.com> Date: Sat Dec 6 13:47:54 2025 -0300 mtp-graph (fix): move llama_get_logits_ith outside the loop commit 15dff208958fb66802f20ec53ce5fcaff133edb7 Merge: 171346c74 cae85fe53 Author: samuel <samueloliveira32df@gmail.com> Date: Thu Oct 16 13:44:41 2025 -0300 Merge branch 'glm4-mtp-batch' of https://github.com/SamuelOliveirads/llama.cpp into glm4-mtp-graph-cache commit cae85fe531876762ee02524fc4c3f6c5e7824c63 Author: samuel <samueloliveira32df@gmail.com> Date: Thu Oct 16 13:42:31 2025 -0300 mtp-batch(fix): avoid logits for mtp kv cache operations commit 171346c742c310bbcfbd786b61250638ccf8b44d Author: samuel <samueloliveira32df@gmail.com> Date: Sun Oct 12 16:33:01 2025 -0300 mtp-graph(feat): Reactivate graph reuse only for main model path commit 0127c6beeb384ec3abbc18b22dbe830f22fcf4b4 Author: samuel <samueloliveira32df@gmail.com> Date: Sat Oct 11 22:20:54 2025 -0300 mtp-batch(chore): Remove final MTP debug logs and dead code commit 4bcc9e261ef57ee4cfaa65d06bcd0fcdeacf7797 Author: samuel <samueloliveira32df@gmail.com> Date: Sat Oct 11 18:51:22 2025 -0300 mtp-batch(fix): Correctly advance cache head and add MTP documentation commit b4cbe030ac25056717763b812d1dd89681c08522 Author: samuel <samueloliveira32df@gmail.com> Date: Sat Oct 11 18:37:40 2025 -0300 mtp-batch(chore): Fix logit flags for speculative sampling and remove debug logs commit a99709d0c1401d0b447dce1bd0101fb56390f50e Author: samuel <samueloliveira32df@gmail.com> Date: Fri Oct 10 17:24:34 2025 -0300 mtp-batch(refactor): Extract decode context and MTP input logic into helper methods commit 913af8f48d2dab1d9e907cf6c48c921a229a295c Author: samuel <samueloliveira32df@gmail.com> Date: Fri Oct 10 16:44:28 2025 -0300 mtp-batch(refactor): Replace MTP boolean flags with an explicit operation enum commit 6f74ba38070d62d37bc0fb71ce9871e1a4ffabcc Author: samuel <samueloliveira32df@gmail.com> Date: Thu Oct 9 22:27:18 2025 -0300 mtp-batch (fix): prevent mtp draft from polluting the cache commit 5e1d719beffccf8c22784c24b52ff6f5ab56b9ff Author: samuel <samueloliveira32df@gmail.com> Date: Thu Oct 9 15:21:23 2025 -0300 mtp-batch (feat): Create and manage sinfo for MTP commit febd8235d27fe9174ee4b54ea7a10e630939fee0 Author: samuel <samueloliveira32df@gmail.com> Date: Sun Oct 5 14:43:40 2025 -0300 mtp-batch (wip): fix how to warmup kv cache for MTP commit 67c6c069e0a5496adfd7d8aa6ca7514db5a6f437 Author: samuel <samueloliveira32df@gmail.com> Date: Sat Sep 27 19:42:32 2025 -0300 mtp-batch (wip): Isolate MTP graph to prevent host embedding buffer corruption commit 75dc25e6fe781c1b65038d69390fb778d760e3a1 Author: samuel <samueloliveira32df@gmail.com> Date: Sat Sep 27 17:17:00 2025 -0300 mtp-batch (wip): organize batch for mtp cache commit 3da7e7f3309dbb576538850c92c1cbf8fdc6d6ee Author: samuel <samueloliveira32df@gmail.com> Date: Tue Sep 23 22:45:11 2025 -0300 mtp-batch (fix): warm mtp cache for small batch size commit df64508b937784112168aa099644b60fef015f05 Author: samuel <samueloliveira32df@gmail.com> Date: Sun Sep 21 21:55:41 2025 -0300 mtp-batch (wip): merge glm graphs commit 042eb8a829876ed175320df9c8133bcea0c40460 Author: samuel <samueloliveira32df@gmail.com> Date: Sun Sep 21 21:29:00 2025 -0300 mtp-batch (wip): merge mtp and model graph commit 1318b2de82716710b9853e07bd640443a5a025bb Author: samuel <samueloliveira32df@gmail.com> Date: Sun Sep 14 10:22:59 2025 -0300 mtp-batch (wip): move mtp execution to batch format commit c6237c71ffd4485df1c35829c380b63e472fc5dd Merge: 9fab53e43 8742ce0e3 Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Sat Sep 13 02:57:01 2025 -0400 Merge pull request #1 from SamuelOliveirads/glm4-moe-mtp feat: implemented sampling for MTP commit 8742ce0e39823eeb101bb5b6099ff4ca7be10c6e Author: samuel <samueloliveira32df@gmail.com> Date: Sat Sep 6 00:21:18 2025 -0300 feat: apply logits + greedy sampler commit 5a5bce85777041d841393b4396e28f8e3065bb10 Author: samuel <samueloliveira32df@gmail.com> Date: Wed Sep 3 17:56:14 2025 -0300 fix: add sample acceptance commit 07670a22c63b1fa335d6ec1c4a1e4255a920848c Author: samuel <samueloliveira32df@gmail.com> Date: Wed Sep 3 13:25:21 2025 -0300 feat: implemented sampling for MTP commit 9fab53e4388c20aef497efd82e86dcb99ca58064 Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Tue Sep 2 17:14:09 2025 -0400 fixed mtp kv cache update step in cases where prompt size > n_batch and n_ubatch commit 98bc0c6bf223f425f4ecea14f13fc46101f1b44a Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Tue Aug 26 01:26:51 2025 -0400 replace standard sampler with greedy sampler for mtp draft commit 471e026327cca9f6f58aeefe32129a6cb9390f4f Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Tue Aug 19 23:10:56 2025 -0400 fixed vram leak commit d72f9d5691054958cd1b139f228e5e588d3974cf Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Tue Aug 19 01:50:34 2025 -0400 kludge-y kv cache management of mtp layer commit 382135aa3619294ab8bf87b0de4b1255ab7942f0 Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Sun Aug 17 21:54:45 2025 -0400 fixed mtp kv cache update sequencing after prompt processing commit 6870f9790c1bb1d0254241267b1a6c8a7fc82830 Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Sun Aug 17 04:59:36 2025 -0400 added proper KV cache management for MTP layers and slightly refactored commit 6e9bafc7a738b4c99f9440c0ec461e08cf6ce702 Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Fri Aug 15 23:13:56 2025 -0400 failed attempt to implement MTP; outputs tokens but KV cache management is unreasonable commit cf0f7c0448c2c1736588673114558e5829db7879 Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Wed Aug 13 02:21:17 2025 -0400 broad thrust of the mtp implementation commit 03231da69eec20677e25e2307d4fe31ac2ede034 Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Tue Aug 12 01:03:59 2025 -0400 add model member function to build mtp graph, to be called from speculative.cpp commit 1f477b375504aa557ed21066aa6783b11781a179 Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Mon Aug 11 20:54:45 2025 -0400 make nextn weights loadable without a crash commit e434f87cc739a1901931d88e33f777170a4e18e7 Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Mon Aug 11 01:21:47 2025 -0400 some work towards building mtp layer graph commit db60623e7926fb151b3cc63f029929122cac342a Author: Aaron Lee <lee.aaron.65@gmail.com> Date: Sun Aug 10 23:52:54 2025 -0400 added getter for nextn layer count and server slot has_mtp property	2025-12-21 17:23:35 -05:00
g2mt	94933c8c2e	server : implement universal assisted decoding (#12635 ) * llama-server : implement universal assisted decoding * Erase prompt tail for kv-cache * set vocab_dft_compatible in common_speculative * rename ctx_main to ctx_tgt * move vocab_dft_compatible to spec struct * clear mem_dft, remove mem * detokenize id_last for incompatible models * update comment * add --spec-replace flag * accept special tokens when translating between draft/main models * Escape spec-replace * clamp draft result to size to params.n_draft * fix comment * clean up code * restore old example * log common_speculative_are_compatible in speculative example * fix * Update common/speculative.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/speculative.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/speculative.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-07-31 14:25:23 +02:00
Georgi Gerganov	abd4d0bc4f	speculative : update default params (#11954 ) * speculative : update default params * speculative : do not discard the last drafted token	2025-02-19 13:29:42 +02:00
Maxim Evtush	7b891bdc86	fix: typos in documentation files (#11791 ) * Update ggml.c * Update arg.cpp * Update speculative.h	2025-02-10 23:21:31 +01:00
Georgi Gerganov	d9d54e498d	speculative : refactor and add a simpler example (#10362 ) * speculative : refactor and add a simpler example ggml-ci * speculative : clean-up and add comments and TODOs [no ci] * speculative : manage context in common_speculative ggml-ci * speculative : simplify ggml-ci * speculative : simplify (cont) ggml-ci * speculative : add --draft-min CLI arg * speculative : minor fixup * make : build fixes * speculative : do not redraft previous drafts ggml-ci * speculative : fix the draft sampling ggml-ci * speculative : fix compile warning * common : refactor args ggml-ci * common : change defaults [no ci] * common : final touches ggml-ci	2024-11-25 09:58:41 +02:00

5 Commits