llama.cpp

Commit Graph

Author	SHA1	Message	Date
Zeel	02d4c32517	common: add two-phase graceful reasoning budget termination Add --reasoning-budget-conclusion N flag that splits the reasoning budget into a thinking phase and a conclusion phase: - At end of thinking budget, inject --reasoning-budget-message and enter INJECTING state (forces message tokens token-by-token) - After message is injected, enter CONCLUDING state giving the model N free tokens to terminate naturally - If model does not self-terminate, fall through to FORCING (hard cutoff) as a safety net New states added to the sampler state machine: IDLE -> COUNTING -> INJECTING -> CONCLUDING -> FORCING -> DONE Setting --reasoning-budget-conclusion 0 (the default) preserves existing behavior exactly — fully backward compatible. Add 5 new tests to test-reasoning-budget.cpp covering: - natural end in conclusion window (no FORCING) - conclusion budget exhausted, safety net fires - no message tokens, conclusion budget only - backward compat with conclusion_budget=0 - multi-token message injection Implements Option B from issue #20632.	2026-04-01 00:19:47 -04:00
Aldehir Rojas	59d840209a	common : inhibit lazy grammar sampler while reasoning is active (#20970 ) * common : inhibit grammar while reasoning budget is active * cont : update force_pos in accept * cont : fix tests * cont : tweak should apply logic * cont : return early not using grammar sampler * Add tests * cont : prevent backend sampling when reasoning budget enabled * cont : fix typo --------- Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>	2026-03-27 18:30:40 +01:00
Piotr Wilkin (ilintar)	acb7c79069	common/parser: handle reasoning budget (#20297 ) * v1 * Finished! * Handlie cli * Reasoning sampler * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Less explosive terminology :) * Add utf-8 case and tests * common : migrate reasoning budget sampler to common * cont : clean up * cont : expose state and allow passing as initial state * cont : remove unused imports * cont : update state machine doc string --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Alde Rojas <hello@alde.dev>	2026-03-11 10:26:12 +01:00

Author

SHA1

Message

Date

Zeel

02d4c32517

common: add two-phase graceful reasoning budget termination

Add --reasoning-budget-conclusion N flag that splits the reasoning budget
into a thinking phase and a conclusion phase:

- At end of thinking budget, inject --reasoning-budget-message and enter
  INJECTING state (forces message tokens token-by-token)
- After message is injected, enter CONCLUDING state giving the model N
  free tokens to terminate naturally
- If model does not self-terminate, fall through to FORCING (hard cutoff)
  as a safety net

New states added to the sampler state machine:
  IDLE -> COUNTING -> INJECTING -> CONCLUDING -> FORCING -> DONE

Setting --reasoning-budget-conclusion 0 (the default) preserves existing
behavior exactly — fully backward compatible.

Add 5 new tests to test-reasoning-budget.cpp covering:
- natural end in conclusion window (no FORCING)
- conclusion budget exhausted, safety net fires
- no message tokens, conclusion budget only
- backward compat with conclusion_budget=0
- multi-token message injection

Implements Option B from issue #20632.

2026-04-01 00:19:47 -04:00

Aldehir Rojas

59d840209a

common : inhibit lazy grammar sampler while reasoning is active (#20970 )

* common : inhibit grammar while reasoning budget is active

* cont : update force_pos in accept

* cont : fix tests

* cont : tweak should apply logic

* cont : return early not using grammar sampler

* Add tests

* cont : prevent backend sampling when reasoning budget enabled

* cont : fix typo

---------

Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>

2026-03-27 18:30:40 +01:00

Piotr Wilkin (ilintar)

acb7c79069

common/parser: handle reasoning budget (#20297 )

* v1

* Finished!

* Handlie cli

* Reasoning sampler

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Less explosive terminology :)

* Add utf-8 case and tests

* common : migrate reasoning budget sampler to common

* cont : clean up

* cont : expose state and allow passing as initial state

* cont : remove unused imports

* cont : update state machine doc string

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Alde Rojas <hello@alde.dev>

2026-03-11 10:26:12 +01:00

3 Commits