llama.cpp/include
samuel fe2baf5e2d Squashed commit of the following:
commit 912ed2cd9339d1b2875d98744ca5b51fa62e581e
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sun Dec 7 23:00:29 2025 -0300

    speculative (feat): implement recursive MTP drafting for GLM-4.5

commit bdf72d9552e3da64ffc85f175664713388752914
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sat Dec 6 16:10:16 2025 -0300

    sampling (feat): optimize speculative drafting with fast-path selection

commit a91980a8f3475a6bbac0a64d8be06dd4b613020e
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sat Dec 6 15:18:19 2025 -0300

    mtp (chore): clean old code

commit 6de0ecf55db8567db4faa99b0152b72c9e854548
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sat Dec 6 14:40:13 2025 -0300

    mtp (feat): add mtp arg

commit ea77394183b8e6c368af969b8274039a54b11486
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sat Dec 6 13:47:54 2025 -0300

    mtp-graph (fix): move llama_get_logits_ith outside the loop

commit 15dff208958fb66802f20ec53ce5fcaff133edb7
Merge: 171346c74 cae85fe53
Author: samuel <samueloliveira32df@gmail.com>
Date:   Thu Oct 16 13:44:41 2025 -0300

    Merge branch 'glm4-mtp-batch' of https://github.com/SamuelOliveirads/llama.cpp into glm4-mtp-graph-cache

commit cae85fe531
Author: samuel <samueloliveira32df@gmail.com>
Date:   Thu Oct 16 13:42:31 2025 -0300

    mtp-batch(fix): avoid logits for mtp kv cache operations

commit 171346c742c310bbcfbd786b61250638ccf8b44d
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sun Oct 12 16:33:01 2025 -0300

    mtp-graph(feat): Reactivate graph reuse only for main model path

commit 0127c6beeb
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sat Oct 11 22:20:54 2025 -0300

    mtp-batch(chore): Remove final MTP debug logs and dead code

commit 4bcc9e261e
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sat Oct 11 18:51:22 2025 -0300

    mtp-batch(fix): Correctly advance cache head and add MTP documentation

commit b4cbe030ac
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sat Oct 11 18:37:40 2025 -0300

    mtp-batch(chore): Fix logit flags for speculative sampling and remove debug logs

commit a99709d0c1
Author: samuel <samueloliveira32df@gmail.com>
Date:   Fri Oct 10 17:24:34 2025 -0300

    mtp-batch(refactor): Extract decode context and MTP input logic into helper methods

commit 913af8f48d
Author: samuel <samueloliveira32df@gmail.com>
Date:   Fri Oct 10 16:44:28 2025 -0300

    mtp-batch(refactor): Replace MTP boolean flags with an explicit operation enum

commit 6f74ba3807
Author: samuel <samueloliveira32df@gmail.com>
Date:   Thu Oct 9 22:27:18 2025 -0300

    mtp-batch (fix): prevent mtp draft from polluting the cache

commit 5e1d719bef
Author: samuel <samueloliveira32df@gmail.com>
Date:   Thu Oct 9 15:21:23 2025 -0300

    mtp-batch (feat): Create and manage sinfo for MTP

commit febd8235d2
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sun Oct 5 14:43:40 2025 -0300

    mtp-batch (wip): fix how to warmup kv cache for MTP

commit 67c6c069e0
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sat Sep 27 19:42:32 2025 -0300

    mtp-batch (wip): Isolate MTP graph to prevent host embedding buffer corruption

commit 75dc25e6fe
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sat Sep 27 17:17:00 2025 -0300

    mtp-batch (wip): organize batch for mtp cache

commit 3da7e7f330
Author: samuel <samueloliveira32df@gmail.com>
Date:   Tue Sep 23 22:45:11 2025 -0300

    mtp-batch (fix): warm mtp cache for small batch size

commit df64508b93
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sun Sep 21 21:55:41 2025 -0300

    mtp-batch (wip): merge glm graphs

commit 042eb8a829
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sun Sep 21 21:29:00 2025 -0300

    mtp-batch (wip): merge mtp and model graph

commit 1318b2de82
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sun Sep 14 10:22:59 2025 -0300

    mtp-batch (wip): move mtp execution to batch format

commit c6237c71ff
Merge: 9fab53e43 8742ce0e3
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Sat Sep 13 02:57:01 2025 -0400

    Merge pull request #1 from SamuelOliveirads/glm4-moe-mtp

    feat: implemented sampling for MTP

commit 8742ce0e39
Author: samuel <samueloliveira32df@gmail.com>
Date:   Sat Sep 6 00:21:18 2025 -0300

    feat: apply logits + greedy sampler

commit 5a5bce8577
Author: samuel <samueloliveira32df@gmail.com>
Date:   Wed Sep 3 17:56:14 2025 -0300

    fix: add sample acceptance

commit 07670a22c6
Author: samuel <samueloliveira32df@gmail.com>
Date:   Wed Sep 3 13:25:21 2025 -0300

    feat: implemented sampling for MTP

commit 9fab53e438
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Tue Sep 2 17:14:09 2025 -0400

    fixed mtp kv cache update step in cases where prompt size > n_batch and n_ubatch

commit 98bc0c6bf2
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Tue Aug 26 01:26:51 2025 -0400

    replace standard sampler with greedy sampler for mtp draft

commit 471e026327
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Tue Aug 19 23:10:56 2025 -0400

    fixed vram leak

commit d72f9d5691
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Tue Aug 19 01:50:34 2025 -0400

    kludge-y kv cache management of mtp layer

commit 382135aa36
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Sun Aug 17 21:54:45 2025 -0400

    fixed mtp kv cache update sequencing after prompt processing

commit 6870f9790c
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Sun Aug 17 04:59:36 2025 -0400

    added proper KV cache management for MTP layers and slightly refactored

commit 6e9bafc7a7
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Fri Aug 15 23:13:56 2025 -0400

    failed attempt to implement MTP; outputs tokens but KV cache management is unreasonable

commit cf0f7c0448
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Wed Aug 13 02:21:17 2025 -0400

    broad thrust of the mtp implementation

commit 03231da69e
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Tue Aug 12 01:03:59 2025 -0400

    add model member function to build mtp graph, to be called from speculative.cpp

commit 1f477b3755
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Mon Aug 11 20:54:45 2025 -0400

    make nextn weights loadable without a crash

commit e434f87cc7
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Mon Aug 11 01:21:47 2025 -0400

    some work towards building mtp layer graph

commit db60623e79
Author: Aaron Lee <lee.aaron.65@gmail.com>
Date:   Sun Aug 10 23:52:54 2025 -0400

    added getter for nextn layer count and server slot has_mtp property
2025-12-21 17:23:35 -05:00
..
llama-cpp.h llama : add `llama_vocab`, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
llama.h Squashed commit of the following: 2025-12-21 17:23:35 -05:00