Daniel Bevenius
e1373fd89c
common : add common_prompt_batch_decode function
...
This commit adds a new function which is responsible for decoding prompt
and optionally handle the saving for session data.
2026-02-06 12:46:55 +01:00
Daniel Bevenius
44bddc0a89
completion : add replying of session state
...
This commit updates the session handing in the completion tool to handle
the that logits are no longer stored in the session file. Instead, we
need to replay the last token to get the logits for sampling.
2026-02-06 12:46:55 +01:00
Daniel Bevenius
25f40ca65f
completion : simplify batch (embd) processing ( #19286 )
...
* completion : simplify batch (embd) processing
This commit simplifies the processing of embd by removing the for loop
that currently exists which uses params.n_batch as its increment. This
commit also removes the clamping of n_eval as the size of embd is always
at most the size of params.n_batch.
The motivation is to clarify the code as it is currently a little
confusing when looking at this for loop in isolation and thinking that
it can process multiple batches.
* add an assert to verify n_eval is not greater than n_batch
2026-02-04 05:43:28 +01:00
Georgi Gerganov
080b161995
completion : fix prompt cache for recurrent models ( #19045 )
2026-01-25 09:12:50 +02:00
o7si
daa242dfc8
common: fix return value check for setpriority ( #18412 )
...
* common: fix return value check for setpriority
* tools: add logging for process priority setting
2025-12-29 11:07:49 +02:00
Xuan-Son Nguyen
7b1db3d3b7
arg: clarify auto kvu/np being set on server ( #17997 )
...
* arg: clarify auto kvu/np being set on server
* improve docs
* use invalid_argument
2025-12-16 12:01:27 +01:00
Georgi Gerganov
254098a279
common : refactor common_sampler + grammar logic changes ( #17937 )
...
* common : refactor common_sampler + grammar logic changes
* tests : increase max_tokens to get needed response
* batched : fix uninitialized samplers
2025-12-14 10:11:13 +02:00
Xuan-Son Nguyen
34a6d86982
cli: enable jinja by default ( #17911 )
...
* cli: enable jinja by default
* Update common/arg.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-10 22:19:42 +01:00
Xuan-Son Nguyen
6c2131773c
cli: new CLI experience ( #17824 )
...
* wip
* wip
* fix logging, add display info
* handle commands
* add args
* wip
* move old cli to llama-completion
* rm deprecation notice
* move server to a shared library
* move ci to llama-completion
* add loading animation
* add --show-timings arg
* add /read command, improve LOG_ERR
* add args for speculative decoding, enable show timings by default
* add arg --image and --audio
* fix windows build
* support reasoning_content
* fix llama2c workflow
* color default is auto
* fix merge conflicts
* properly fix color problem
Co-authored-by: bandoti <bandoti@users.noreply.github.com>
* better loading spinner
* make sure to clean color on force-exit
* also clear input files on "/clear"
* simplify common_log_flush
* add warning in mtmd-cli
* implement console writter
* fix data race
* add attribute
* fix llama-completion and mtmd-cli
* add some notes about console::log
* fix compilation
---------
Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00