Georgi Gerganov
38882247d3
Merge branch 'master' into HEAD
2025-12-10 17:07:21 +02:00
Xuan-Son Nguyen
6c2131773c
cli: new CLI experience ( #17824 )
...
* wip
* wip
* fix logging, add display info
* handle commands
* add args
* wip
* move old cli to llama-completion
* rm deprecation notice
* move server to a shared library
* move ci to llama-completion
* add loading animation
* add --show-timings arg
* add /read command, improve LOG_ERR
* add args for speculative decoding, enable show timings by default
* add arg --image and --audio
* fix windows build
* support reasoning_content
* fix llama2c workflow
* color default is auto
* fix merge conflicts
* properly fix color problem
Co-authored-by: bandoti <bandoti@users.noreply.github.com>
* better loading spinner
* make sure to clean color on force-exit
* also clear input files on "/clear"
* simplify common_log_flush
* add warning in mtmd-cli
* implement console writter
* fix data race
* add attribute
* fix llama-completion and mtmd-cli
* add some notes about console::log
* fix compilation
---------
Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00
Georgi Gerganov
81cb5783c8
Merge branch 'master' into HEAD
2025-12-10 13:41:32 +02:00
Georgi Gerganov
560ac16f7d
server : handle unsupported cases
2025-12-09 10:55:11 +02:00
Georgi Gerganov
f3beb22b17
sampling : handle n_probs case
2025-12-08 21:30:10 +02:00
Xuan-Son Nguyen
951520ddb0
server: delegate result_state creation to server_task ( #17835 )
...
* server: delegate result_state creation to server_task
* remove unued states
* add more docs
2025-12-08 17:04:38 +01:00
Georgi Gerganov
6d38db5dfe
Merge branch 'master' into HEAD
2025-12-08 17:55:24 +02:00
Xuan-Son Nguyen
f896d2c34f
server: improve speed of speculative decoding ( #17808 )
...
* server: improve speed of speculative decoding
* fix small draft case
* add link to the PR
* server : fix generation time measurement
* server : fix draft acceptance logs (add SRV_CNT, SLT_CNT macros)
* server : add comment
* add PR to docs
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-08 14:35:28 +01:00
Georgi Gerganov
2bc96931d2
server : make cache_reuse configurable per request ( #17858 )
2025-12-08 12:43:12 +02:00
Xuan-Son Nguyen
c42712b056
server: support multiple generations from one prompt (OAI "n" option) ( #17775 )
...
* backend support
* server: support multiple generations from one prompt (OAI "n" option)
* fix invalid batch
* format oai
* clean up
* disable ctx shift
* add test
* update comments
* fix style
* add n_cmpl to docs [no ci]
* allowing using both n_cmpl and n
2025-12-06 15:54:38 +01:00
Georgi Gerganov
30742a6ff5
sampling : expand support (wip)
2025-12-06 16:51:56 +02:00
Oliver Simons
7668999518
Merge branch 'master' into gpu-sampling
...
Let's keep `master's` cumsum implementation for it's likely better AMD
perf and add back pure-CUB-implementation in follow-up commit
2025-12-05 14:41:08 +01:00
Georgi Gerganov
6958d41366
sampling : check backend support during init
2025-12-04 17:29:08 +02:00
Xuan-Son Nguyen
c4c10bfb86
server: move msg diffs tracking to HTTP thread ( #17740 )
...
* server: move msg diffs tracking to HTTP thread
* wip
* tool call tests ok
* minor : style
* cont : fix
* move states to server_response_reader
* add safe-guard
* fix
* fix 2
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-04 15:46:08 +01:00
Daniel Bevenius
c0b182f4d6
Merge remote-tracking branch 'upstream/master' into backend-sampling
2025-12-04 08:17:50 +01:00
Xuan-Son Nguyen
13628d8bdb
server: add --media-path for local media files ( #17697 )
...
* server: add --media-path for local media files
* remove unused fn
2025-12-02 22:49:20 +01:00
Daniel Bevenius
2595818a68
Merge remote-tracking branch 'upstream/master' into backend-sampling
2025-12-02 12:07:01 +01:00
Xuan-Son Nguyen
5d6bd842ea
server: remove default "gpt-3.5-turbo" model name ( #17668 )
...
* server: remove default "gpt-3.5-turbo" model name
* do not reflect back model name from request
* fix test
2025-12-02 11:38:57 +01:00
Daniel Bevenius
3e9a258c14
Merge remote-tracking branch 'upstream/master' into gpu-sampling
2025-12-02 09:26:04 +01:00
Xuan-Son Nguyen
ecf74a8417
mtmd: add mtmd_context_params::warmup option ( #17652 )
...
* mtmd: add mtmd_context_params::warmup option
* reuse the common_params::warmup
2025-12-01 21:32:25 +01:00
Georgi Gerganov
c187003d81
llama : naming
2025-11-30 00:05:47 +02:00
Georgi Gerganov
467746e3ad
Merge branch 'master' into HEAD
2025-11-29 23:17:25 +02:00
Xuan-Son Nguyen
ab49f094d2
server: move server-context to its own cpp|h ( #17595 )
...
* git mv
* add server-context.h
* add server-context.h
* clean up headers
* cont : cleanup
* also expose server_response_reader (to be used by CLI)
* fix windows build
* decouple server_routes and server_http
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-29 22:04:44 +01:00