Commit Graph

1 Commits

Author SHA1 Message Date
itigges22 4aeffc690d doc: document MTP attention requirement for higher acceptance
The MTP head has attention weights (Q/K/V) but they are currently unused
(FFN-only path). Adding attention requires resolving the ggml buffer
allocation for the MTP layer, which has has_kv=false.

Approaches tried:
- build_attn with KV cache at il_kv=31: corrupts main model KV
- build_attn_inp_no_cache: GGML_ASSERT(buffer) failed
- build_attn_mha: GGML_ASSERT(buffer) failed
- Manual attention with ggml ops: GGML_ASSERT(buffer) failed

Root cause: graph scheduler doesn't allocate buffers for MTP layer
attention ops. Need to either extend n_layer_kv_from_start to include
MTP layers, or add the MTP attention to the graph plan before
scheduler runs.

Current state: FFN-only MTP gives 95% acceptance rate at temp=0.6.
2026-03-20 00:52:32 -04:00