server : [easy] fix per round speculative decode logging (#18211)
Currently we always log 0, as we clear slot.drafted before. To reproduce: Run llama-server with devstral-2 as main model and devstral-2-small as md, and verbose logging: ``` % ./build/bin/llama-server -v \ -m ~/llms/Devstral-2-123B-Instruct-2512-UD-Q6_K_XL-00001-of-00003.gguf \ -md ~/llms/Devstral-Small-2-24B-Instruct-2512-UD-Q2_K_XL.gguf \ -c 8192 2> /tmp/llama.cpp.debug Check the log: slot update_slots: id 3 | task 0 | accepted 11/0 draft tokens, new n_tokens = 741 slot update_slots: id 3 | task 0 | accepted 4/0 draft tokens, new n_tokens = 746 slot update_slots: id 3 | task 0 | accepted 16/0 draft tokens, new n_tokens = 763 slot update_slots: id 3 | task 0 | accepted 11/0 draft tokens, new n_tokens = 775 slot update_slots: id 3 | task 0 | accepted 2/0 draft tokens, new n_tokens = 778 slot update_slots: id 3 | task 0 | accepted 4/0 draft tokens, new n_tokens = 783 slot update_slots: id 3 | task 0 | accepted 8/0 draft tokens, new n_tokens = 792 slot update_slots: id 3 | task 0 | accepted 2/0 draft tokens, new n_tokens = 795 slot update_slots: id 3 | task 0 | accepted 1/0 draft tokens, new n_tokens = 797 slot update_slots: id 3 | task 0 | accepted 1/0 draft tokens, new n_tokens = 799 slot update_slots: id 3 | task 0 | accepted 0/0 draft tokens, new n_tokens = 800 slot update_slots: id 3 | task 0 | accepted 2/0 draft tokens, new n_tokens = 803 slot update_slots: id 3 | task 0 | accepted 1/0 draft tokens, new n_tokens = 805 slot update_slots: id 3 | task 0 | accepted 6/0 draft tokens, new n_tokens = 812 slot update_slots: id 3 | task 0 | accepted 3/0 draft tokens, new n_tokens = 816 ``` After the fix, get correct per round logging: ``` slot update_slots: id 3 | task 0 | accepted 7/8 draft tokens, new n_tokens = 654 slot update_slots: id 3 | task 0 | accepted 1/2 draft tokens, new n_tokens = 656 slot update_slots: id 3 | task 0 | accepted 2/16 draft tokens, new n_tokens = 659 slot update_slots: id 3 | task 0 | accepted 1/16 draft tokens, new n_tokens = 661 slot update_slots: id 3 | task 0 | accepted 2/16 draft tokens, new n_tokens = 664 slot update_slots: id 3 | task 0 | accepted 16/16 draft tokens, new n_tokens = 681 slot update_slots: id 3 | task 0 | accepted 16/16 draft tokens, new n_tokens = 698 slot update_slots: id 3 | task 0 | accepted 3/4 draft tokens, new n_tokens = 702 slot update_slots: id 3 | task 0 | accepted 5/12 draft tokens, new n_tokens = 708 slot update_slots: id 3 | task 0 | accepted 16/16 draft tokens, new n_tokens = 725 slot update_slots: id 3 | task 0 | accepted 1/1 draft tokens, new n_tokens = 727 slot update_slots: id 3 | task 0 | accepted 8/16 draft tokens, new n_tokens = 736 ```
This commit is contained in:
parent
9e39a1e6a9
commit
408616adbd
|
|
@ -2628,7 +2628,7 @@ struct server_context_impl {
|
|||
}
|
||||
}
|
||||
|
||||
SLT_DBG(slot, "accepted %d/%d draft tokens, new n_tokens = %d\n", (int) ids.size() - 1, (int) slot.drafted.size(), slot.prompt.n_tokens());
|
||||
SLT_DBG(slot, "accepted %d/%d draft tokens, new n_tokens = %d\n", (int) ids.size() - 1, (int) n_draft, slot.prompt.n_tokens());
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue