Add --reasoning-budget-conclusion N flag that splits the reasoning budget
into a thinking phase and a conclusion phase:
- At end of thinking budget, inject --reasoning-budget-message and enter
INJECTING state (forces message tokens token-by-token)
- After message is injected, enter CONCLUDING state giving the model N
free tokens to terminate naturally
- If model does not self-terminate, fall through to FORCING (hard cutoff)
as a safety net
New states added to the sampler state machine:
IDLE -> COUNTING -> INJECTING -> CONCLUDING -> FORCING -> DONE
Setting --reasoning-budget-conclusion 0 (the default) preserves existing
behavior exactly — fully backward compatible.
Add 5 new tests to test-reasoning-budget.cpp covering:
- natural end in conclusion window (no FORCING)
- conclusion budget exhausted, safety net fires
- no message tokens, conclusion budget only
- backward compat with conclusion_budget=0
- multi-token message injection
Implements Option B from issue #20632.