llama.cpp

Commit Graph

Author	SHA1	Message	Date
손희준	fbbf3ad190	server: /v1/responses (partial) (#18486 ) * from previous PR * Make instruction(system) as first message * Convert [input_message] (text/image/file) * Rename convert_responses_to_chatcmpl(body) -> response_body * Initial tool call support * Erase instructions field from chatcmpl body * Feed reasoning texts to chat template * Use std::vector instead of opaque json array * Make output_item.added events consistent * Move `server_task_result_cmpl_partial::update` from header to source * Match ID of output_item.added and .done events * Add function_call only if there is no "fc_" prefix * Add function call output at non-streaming API * Test if ID is persistent * Add doc * Fix style - use trailing comma * Rewrite state management * catch up with upstream/master * Fix style - "type" is the first item of SSE data * Explicitly check "instructions" from response_body * Make lambdas static * Check if reasoning content exists * Add `oai_resp_id` to task_result_state(also initialized at ctor), server_task_result_cmpl_partial, and server_task_result_cmpl_final * Reject `input_file` since it is not supported by chatcmpl * Add "fc_" prefix to non-straming function call id as coderabbit pointed out --------- Co-authored-by: openingnow <>	2026-01-21 17:47:23 +01:00
Adrien Gallouët	1c7cf94b22	common, server : use the same User-Agent by default (#18957 ) This commit also ensures that if a custom User-Agent is used, it will be the only one sent. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-20 18:28:43 +01:00
Xuan-Son Nguyen	2c1f199653	cli : fix reasoning responses in CLI (#18961 ) * cli : fix reasoning responses in CLI * fix build * fix build (2)	2026-01-20 18:23:25 +01:00
Xuan-Son Nguyen	6df686bee6	server : refactor oai_parser_opt, move it to server_chat_params (#18937 ) * server_chat_params * move chat format into CLI * use meta whenever possible * clean up, no more chatml fallback	2026-01-19 23:28:01 +01:00
Xuan-Son Nguyen	6ce863c803	server: prevent data race from HTTP threads (#18263 ) * server: prevent data race from HTTP threads * fix params * fix default_generation_settings * nits: make handle_completions_impl looks less strange * stricter const * fix GGML_ASSERT(idx < states.size()) * move index to be managed by server_response_reader * http: make sure req & res lifecycle are tied together * fix compile * fix index handling buggy * fix data race for lora endpoint * nits: fix shadow variable * nits: revert redundant changes * nits: correct naming for json_webui_settings	2025-12-22 14:23:34 +01:00
Xuan-Son Nguyen	f896d2c34f	server: improve speed of speculative decoding (#17808 ) * server: improve speed of speculative decoding * fix small draft case * add link to the PR * server : fix generation time measurement * server : fix draft acceptance logs (add SRV_CNT, SLT_CNT macros) * server : add comment * add PR to docs --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-08 14:35:28 +01:00
Xuan-Son Nguyen	c42712b056	server: support multiple generations from one prompt (OAI "n" option) (#17775 ) * backend support * server: support multiple generations from one prompt (OAI "n" option) * fix invalid batch * format oai * clean up * disable ctx shift * add test * update comments * fix style * add n_cmpl to docs [no ci] * allowing using both n_cmpl and n	2025-12-06 15:54:38 +01:00
Xuan-Son Nguyen	13628d8bdb	server: add --media-path for local media files (#17697 ) * server: add --media-path for local media files * remove unused fn	2025-12-02 22:49:20 +01:00
Xuan-Son Nguyen	5d6bd842ea	server: remove default "gpt-3.5-turbo" model name (#17668 ) * server: remove default "gpt-3.5-turbo" model name * do not reflect back model name from request * fix test	2025-12-02 11:38:57 +01:00
Fredrik Hultin	ddf9f94389	server : add Anthropic Messages API support (#17570 ) * server : add Anthropic Messages API support * remove -@pytest.mark.slow from tool calling/jinja tests * server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py * server : removed redundant n field logic in anthropic_params_from_json * server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream() * server : refactor Anthropic API to use OAI conversion * make sure basic test always go first * clean up * clean up api key check, add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-11-28 12:57:04 +01:00
Xuan-Son Nguyen	b8372eecd9	server: split server.cpp code into server/common/task/queue (#17362 ) * add server-task, server-common * add server-queue * rm redundant includes * move enum stop_type to server-task * server : headers cleanup --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-24 14:41:53 +01:00

11 Commits