Commit Graph

6 Commits

Author SHA1 Message Date
Xuan-Son Nguyen f896d2c34f
server: improve speed of speculative decoding (#17808)
* server: improve speed of speculative decoding

* fix small draft case

* add link to the PR

* server : fix generation time measurement

* server : fix draft acceptance logs (add SRV_CNT, SLT_CNT macros)

* server : add comment

* add PR to docs

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-08 14:35:28 +01:00
Xuan-Son Nguyen c42712b056
server: support multiple generations from one prompt (OAI "n" option) (#17775)
* backend support

* server: support multiple generations from one prompt (OAI "n" option)

* fix invalid batch

* format oai

* clean up

* disable ctx shift

* add test

* update comments

* fix style

* add n_cmpl to docs [no ci]

* allowing using both n_cmpl and n
2025-12-06 15:54:38 +01:00
Xuan-Son Nguyen 13628d8bdb
server: add --media-path for local media files (#17697)
* server: add --media-path for local media files

* remove unused fn
2025-12-02 22:49:20 +01:00
Xuan-Son Nguyen 5d6bd842ea
server: remove default "gpt-3.5-turbo" model name (#17668)
* server: remove default "gpt-3.5-turbo" model name

* do not reflect back model name from request

* fix test
2025-12-02 11:38:57 +01:00
Fredrik Hultin ddf9f94389
server : add Anthropic Messages API support (#17570)
* server : add Anthropic Messages API support

* remove -@pytest.mark.slow from tool calling/jinja tests

* server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py

* server : removed redundant n field logic in anthropic_params_from_json

* server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream()

* server : refactor Anthropic API to use OAI conversion

* make sure basic test always go first

* clean up

* clean up api key check, add test

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-28 12:57:04 +01:00
Xuan-Son Nguyen b8372eecd9
server: split server.cpp code into server/common/task/queue (#17362)
* add server-task, server-common

* add server-queue

* rm redundant includes

* move enum stop_type to server-task

* server : headers cleanup

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-24 14:41:53 +01:00