llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aleksander Grygier	72e5d9ae2a	chore: update webui build output	2026-02-13 13:35:58 +01:00
Aleksander Grygier	dd1fe96e18	feat: Improve formatting performance time	2026-02-13 13:35:58 +01:00
Aleksander Grygier	eed0c5ae48	fix: System prompt sorting	2026-02-13 13:35:58 +01:00
Aleksander Grygier	16aa6fae0a	fix: Save draft message in Chat Form when adding System Prompt from new chat view	2026-02-13 13:33:06 +01:00
Aleksander Grygier	0fe25847ff	fix: Chat Form submission	2026-02-13 13:33:06 +01:00
Aleksander Grygier	ed70cb577d	chore: update webui build output	2026-02-13 13:33:05 +01:00
Aleksander Grygier	141540ccbb	feat: MCP Prompts WIP	2026-02-13 13:33:05 +01:00
Aleksander Grygier	46ced87178	chore: update webui build output	2026-02-13 13:32:47 +01:00
Aleksander Grygier	43da6b8676	feat: UI improvements	2026-02-13 13:32:47 +01:00
Aleksander Grygier	17b326b32a	chore: update webui build output	2026-02-13 13:30:16 +01:00
Aleksander Grygier	aaeea933b7	feat: Architectural improvements	2026-02-13 13:30:16 +01:00
Aleksander Grygier	da252e3425	feat: Per-conversation agentic loop state	2026-02-13 13:28:24 +01:00
Aleksander Grygier	1565cda1ff	chore: update webui build output	2026-02-13 13:28:24 +01:00
Aleksander Grygier	a8c2e66e92	feat: Improve MCP Server selection UI + lazy load health checks	2026-02-13 13:28:24 +01:00
Aleksander Grygier	f8d6d16df1	feat: UI improvements	2026-02-13 13:21:35 +01:00
Aleksander Grygier	690dd09b5f	feat: Simplify MCP server enabling logic per chat Refactors MCP server enabling logic to remove the dependency on global settings. This simplifies the logic by directly checking the per-chat override status, and removes the need to pass the global enabled state as a parameter. Additionally: - Only shows MCP servers that are enabled in settings in the selector. - Sorts the servers by whether they are enabled for the current chat.	2026-02-13 13:21:35 +01:00
Aleksander Grygier	a12304cdea	chore: update webui build output	2026-02-13 13:21:35 +01:00
Aleksander Grygier	52f21b4ca4	fix: Missing onModelChange callback running assistant message re-generation	2026-02-13 13:21:35 +01:00
Pascal	20e5e70c61	chore: update webui build output	2026-02-13 13:21:35 +01:00
Pascal	a2cce59d69	fix: acurate tool_response display	2026-02-13 13:21:35 +01:00
Pascal	fdd67f45e6	fix: unify MCP server label logic with simplified fallback	2026-02-13 13:21:35 +01:00
Pascal	bdd9bcfb75	chore: update webui build output	2026-02-13 13:21:35 +01:00
Pascal	a515179730	refactor: remove multimodal validation from model selector Remove all frontend validation logic that prevented users from selecting models based on multimodal capabilities. This refactoring removes restrictive UI code while maintaining full functionality - Vision models can describe images as text - That text remains useful for non-vision models - Chaining vision -> non-vision is a valid workflow - Users know their use case better than the UI - Users can return to vision models when needed	2026-02-13 13:21:35 +01:00
Pascal	c7e76c65d1	chore: update webui build output	2026-02-13 13:21:35 +01:00
Pascal	37c084873c	fix: ignore assistant attachments (MCP) for modality detection	2026-02-13 13:21:35 +01:00
Pascal	d09cdfaf0a	chore: update webui build output	2026-02-13 13:21:35 +01:00
Pascal	6d41f74031	refactor: eliminate MCP circular dependency - Change architecture from mcpStore <-> mcpClient to mcpClient -> mcpStore - Remove bidirectional callback pattern (setCallback, notify methods) - Add updateState/updateHealthCheck public methods in mcpStore - Replace callback calls with direct mcpStore method calls - Remove unused imports (browser, HealthCheckState) and constructor - Fixes CI: ReferenceError Cannot access mcpClient before initialization	2026-02-13 13:21:35 +01:00
Pascal	07ae189175	chore: update webui build output	2026-02-13 13:21:34 +01:00
Pascal	23741b3c6a	fix: strip reasoning content and UI proprietary tags from prompts TODO: add toggle and ensure backend API compliance for reasoning format	2026-02-13 13:21:34 +01:00
Pascal	b5b527fa52	chore: update webui build output	2026-02-13 13:21:34 +01:00
Pascal	fb1ec29898	refactor: remove reasoning after first turn filter	2026-02-13 13:21:34 +01:00
Pascal	fc5d9f587f	refactor: inline reasoning with tags, remove fixed thinking field	2026-02-13 13:21:34 +01:00
Pascal	6b3bc23fc2	chore: update webui build output	2026-02-13 13:21:34 +01:00
Pascal	c73baed7e3	feat: resolve MCP attachment images via rehype plugin LLM can reference tool-generated images using markdown links like, plugin resolves attachment names to base64 from message.extra when present, regular HTTP/data URLs pass through unchanged (no regression) - rehypeResolveAttachmentImages plugin in markdown pipeline - Pass message prop to MarkdownContent and AgenticContent - Force processor reactivity on message.extra changes - Filter assistant images from API context (display-only)	2026-02-13 13:21:34 +01:00
Pascal	09381a59fd	feat: persist base64 attachments from tool results	2026-02-13 13:21:34 +01:00
Pascal	f16457551e	webui: fix custom headers persistence in UI (derived)	2026-02-13 13:21:34 +01:00
Pascal	f42e5f114e	webui: fix custom headers persistence in UI	2026-02-13 13:21:34 +01:00
Aleksander Grygier	162bd976ed	fix: Word wrapping	2026-02-13 13:21:34 +01:00
Aleksander Grygier	c2dd1d2fed	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	008463149b	feat: UI improvements	2026-02-13 13:21:34 +01:00
Aleksander Grygier	1dba2ec4a9	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	805c171825	feat: UI improvement	2026-02-13 13:21:34 +01:00
Aleksander Grygier	d6455a7530	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	bb4bd7fe09	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	05dfb5e70c	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	cad9ca1208	feat: MCP Server Details	2026-02-13 13:21:34 +01:00
Aleksander Grygier	0e980bf881	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	825d2ea9a9	feat: MCP connection details WIP	2026-02-13 13:21:34 +01:00
Aleksander Grygier	2b37f70c37	refactor: MCP types and health check	2026-02-13 13:21:34 +01:00
Aleksander Grygier	36a37d1794	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	38ba6d8372	refactor: KeyValuePairs component	2026-02-13 13:21:34 +01:00
Aleksander Grygier	c5465d4893	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	57089370e4	refactor: DRY	2026-02-13 13:21:34 +01:00
Aleksander Grygier	f80d5f615e	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	e1da51335c	refactor: Architecture improvements	2026-02-13 13:21:34 +01:00
Aleksander Grygier	3bc8d93546	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	48b2b1b2f0	refactor: MCP state management + stores/clients relationship	2026-02-13 13:21:34 +01:00
Aleksander Grygier	2cd682178b	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	da8baaa9b8	fix: Distinguish streaming vs incomplete tool calls in UI	2026-02-13 13:21:34 +01:00
Aleksander Grygier	3179858e5f	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	9471729162	fix: Restore live reactive UI progress for tool calls	2026-02-13 13:21:34 +01:00
Aleksander Grygier	64923b20be	chore: update webui build output	2026-02-13 13:21:34 +01:00
Pascal	179477b4ed	fix: reset tool call state between turns	2026-02-13 13:21:34 +01:00
Pascal	38244a1bfa	webui: enable streaming of tool call arguments	2026-02-13 13:21:34 +01:00
Aleksander Grygier	2faf237d01	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	5ffb6aba3a	refactor: Cleanup	2026-02-13 13:21:34 +01:00
Pascal	96e51e2a41	webui: prevent mobile dropdown immediate close on synthetic click	2026-02-13 13:20:42 +01:00
Pascal	8916698294	webui: fix redirect to root ignoring base path	2026-02-13 13:20:42 +01:00
Aleksander Grygier	2a33fc2059	refactor: Cleanup	2026-02-13 13:20:41 +01:00
Aleksander Grygier	04913f20d9	chore: update webui build output	2026-02-13 13:20:41 +01:00
Aleksander Grygier	939e7aa16b	refactor: Types	2026-02-13 13:20:41 +01:00
Aleksander Grygier	bef865d871	refactor: Componentize McpServerCard	2026-02-13 13:20:41 +01:00
Aleksander Grygier	7dbb05a160	refactor: Cleanup	2026-02-13 13:20:41 +01:00
Aleksander Grygier	7e194f653a	fix: Remove redundant CSS class	2026-02-13 13:20:41 +01:00
Aleksander Grygier	02c87fa3c9	feat: Add TruncatedText component	2026-02-13 13:20:41 +01:00
Aleksander Grygier	27b80ae3e8	fix: Collapsible box trigger	2026-02-13 13:20:26 +01:00
Aleksander Grygier	408e098324	refactor: Cleanup	2026-02-13 13:20:26 +01:00
Aleksander Grygier	0b36d04c38	refactor: Cleanup	2026-02-13 13:20:07 +01:00
Aleksander Grygier	df464c1f5a	refactor: Collapsible Content Block & small fixes	2026-02-13 13:18:20 +01:00
Aleksander Grygier	26044454ef	chore: update webui build output	2026-02-13 13:18:20 +01:00
Aleksander Grygier	f0ac6fa039	refactor: Cleanup	2026-02-13 13:18:20 +01:00
Aleksander Grygier	7c9ba36216	chore: update webui build output	2026-02-13 13:18:20 +01:00
Aleksander Grygier	7ab269cd77	feat: UI improvements	2026-02-13 13:18:20 +01:00
Aleksander Grygier	e0122465ed	feat: Always show Mcp Selector	2026-02-13 13:18:20 +01:00
Pascal	36c9ad9303	fix: remove double scrollbar in model selector by using Bits UI content available height	2026-02-13 13:18:20 +01:00
Aleksander Grygier	bc60beb1a7	feat: Enable adding System Prompt per-chat	2026-02-13 13:18:20 +01:00
Aleksander Grygier	276a3e9416	fix: UI	2026-02-13 13:17:51 +01:00
Aleksander Grygier	c74065de75	chore: update webui build output	2026-02-13 13:17:51 +01:00
Aleksander Grygier	e6ad864984	feat: UI improvements	2026-02-13 13:17:51 +01:00
Pascal	cff237cb3e	webui: raw tool result display, strip only leading/trailing newlines to preserve indentation	2026-02-13 13:17:33 +01:00
Pascal	afb79b2970	webui: split raw output into backend parsing and frontend display options	2026-02-13 13:17:33 +01:00
Pascal	18efdabb12	webui: remove legacy wrapper and restore WebSocket transport	2026-02-13 13:17:33 +01:00
Pascal	a13782a4d1	webui: remove unused imports	2026-02-13 13:17:33 +01:00
Aleksander Grygier	d548bf27dd	chore: update webui build output	2026-02-13 13:17:33 +01:00
Aleksander Grygier	bdd5958f6d	feat: Improve agentic tool call streaming display with 'in progress' state	2026-02-13 13:17:32 +01:00
Aleksander Grygier	a9c2ea7a8e	feat: Enhance MCP server dropdown with search, popularity sorting, and per-chat overrides	2026-02-13 13:17:32 +01:00
Aleksander Grygier	dfce09b34b	feat: Add per-chat MCP server overrides	2026-02-13 13:17:32 +01:00
Aleksander Grygier	54374edecd	chore: update webui build output	2026-02-13 13:17:32 +01:00
Aleksander Grygier	b763a4cc69	feat: Add image load error fallback in MarkdownContent	2026-02-13 13:17:32 +01:00
Aleksander Grygier	af9a76b6dc	feat: Implement lazy MCP client shutdown	2026-02-13 13:17:32 +01:00
Aleksander Grygier	c7870a3903	feat: Enhance tool call streaming UI and output format	2026-02-13 13:17:32 +01:00
Aleksander Grygier	fb5e464fe7	feat: Display and manage servers in ChatForm actions	2026-02-13 13:17:32 +01:00
Aleksander Grygier	dc7a3f33ba	feat: Integrate server management dialog into chat settings	2026-02-13 13:03:15 +01:00
Aleksander Grygier	0b13c95519	feat: Implement dedicated server management UI components	2026-02-13 13:03:15 +01:00
Aleksander Grygier	8df7e4a54f	refactor: Centralize health check logic in store	2026-02-13 13:03:15 +01:00
Aleksander Grygier	9a8cae462e	feat: Enhance server config with headers and schema normalization	2026-02-13 13:03:15 +01:00
Aleksander Grygier	bc2d879dea	feat: Add McpLogo Svelte component	2026-02-13 13:03:15 +01:00
Aleksander Grygier	42d52605d9	refactor: Consolidate UI CSS classes into shared module	2026-02-13 13:03:15 +01:00
Aleksander Grygier	6c95020b06	chore: update webui build output	2026-02-13 12:57:23 +01:00
Aleksander Grygier	62dbc9f654	feat: Raw LLM output switch per message	2026-02-13 12:57:23 +01:00
Aleksander Grygier	284425097b	refactor: Tool call handling	2026-02-13 12:57:03 +01:00
Aleksander Grygier	5beeb88a37	docs: Update high-level architecture diagrams for MCP integration	2026-02-13 12:55:42 +01:00
Aleksander Grygier	acdd30e3af	feat: Add AgenticContent component for enhanced tool call rendering	2026-02-13 12:55:42 +01:00
Aleksander Grygier	49a8c8b148	refactor: Update ChatStore to leverage mcpStore for agentic flow	2026-02-13 12:55:42 +01:00
Aleksander Grygier	5b582beb75	feat: Implement agentic orchestration within ChatService	2026-02-13 12:55:03 +01:00
Aleksander Grygier	391479edb2	feat: Introduce reactive mcpStore for client lifecycle management	2026-02-13 12:55:03 +01:00
Aleksander Grygier	7e184c174d	feat: Refactor MCP client to use official SDK	2026-02-13 12:55:03 +01:00
Aleksander Grygier	1a041a5b9b	feat: Add @modelcontextprotocol/sdk and zod dependencies	2026-02-13 12:55:03 +01:00
Aleksander Grygier	2325d2a50d	refactor: Update Agentic and MCP config parsing to use new utils and constants	2026-02-13 12:55:03 +01:00
Aleksander Grygier	0c24db3178	feat: Centralize MCP and Agentic type definitions and constants	2026-02-13 12:55:02 +01:00
Aleksander Grygier	26a19183b7	feat: Introduce common utility functions	2026-02-13 12:55:02 +01:00
Pascal	14f6728ef1	webui: use normalizedMessages after upstream refactor	2026-02-13 12:55:02 +01:00
Pascal	cb99ed9f71	webui: MCP client with low coupling to current codebase	2026-02-13 12:55:02 +01:00
Aleksander Grygier	5174d7206f	webui: UI and routing fixes (#19586 ) * chore: update webui build output * chore: update webui build output * fix: Scroll issues in DropdownMenuSearchable * webui: fix redirect to root ignoring base path * fix: Word wrapping * fix: remove obsolete modality UI tests causing CI failures - Remove VisionModality/AudioModality test stories - Remove mockServerProps usage and imports - Simplify Default test (remove dropdown interaction checks) - Simplify FileAttachments test (remove mocks) * feat: Improve formatting performance time --------- Co-authored-by: Pascal <admin@serveurperso.com>	2026-02-13 12:31:00 +01:00
Aleksander Grygier	4c61875bf8	webui: Add switcher to Chat Message UI to show raw LLM output (#19571 )	2026-02-12 19:55:51 +01:00
Aleksander Grygier	4d688f9ebb	(webui) FEATURE: Enable adding or injecting System Message into chat (#19556 ) * feat: Enable adding System Prompt per-chat * fix: Save draft message in Chat Form when adding System Prompt from new chat view * fix: Proper system message deletion logic * chore: Formatting * chore: update webui build output	2026-02-12 13:56:08 +01:00
Aleksander Grygier	f486ce9f30	(webui) REFACTOR: UI primitives and polish (#19551 ) * webui: UI primitives and polish (non-MCP) * chore: update webui build output	2026-02-12 12:21:00 +01:00
Aleksander Grygier	38adc7d469	WebUI Architecture Cleanup (#19541 ) * webui: architecture foundation (non-MCP core refactors) * chore: update webui build output	2026-02-12 11:22:27 +01:00
RichardScottOZ	fa16e517a3	server : fix typo in README.md for features list (#19510 ) extra l for full	2026-02-12 08:56:25 +01:00
손희준	820ebfa6f4	Server: log when converting requests to chat completions format (#19457 ) * Log converting requests * Print as debug instead of info [no ci] --------- Co-authored-by: openingnow <>	2026-02-09 16:22:57 +01:00
Sascha Rogmann	292f6908cd	spec : remove check rate (#19377 ) * spec: remove parameter spec-ngram-check-rate * spec : renamed statistics vars * spec : add n_call_begin, n_call_accept * spec : don't enable key-map-stats	2026-02-09 15:30:50 +02:00
Georgi Gerganov	eb449cdfa4	server : improve context checkpoint logic (#19408 )	2026-02-08 09:40:04 +02:00
Georgi Gerganov	dfde5993ea	common : add common_speculative_is_compat() (#19270 ) * llama : add llama_memory_can_rm_suffix() * Revert "llama : add llama_memory_can_rm_suffix()" This reverts commit `d30e59b62a`. * spec : check if the target context is compatible for spec decoding	2026-02-06 16:47:22 +02:00
Matthieu Coudron	a3fa035822	server: print actual model name in 'model not found" error (#19117 ) Experimenting with AI, my environment gets messy fast and it's not always easy to know what model my software is trying to load. This helps with troubleshooting. before: Error: { code = 400, message = "model not found", type = "invalid_request_error" } After: Error: { code = 400, message = "model 'toto' not found", type = "invalid_request_error" }	2026-02-02 16:55:27 +01:00
Christian Kastner	7a4ca3cbd9	docs : Minor cleanups (#19252 ) * Update old URLs to github.com/ggml-org/ * Bump copyrights	2026-02-02 08:38:55 +02:00
Georgi Gerganov	bbada8bfb9	server : wrap around the "id_slot" parameter (#19207 ) * server : wrap around the "id_slot" parameter * cont : minor	2026-01-30 19:46:10 +02:00
Georgi Gerganov	dabaa2e77a	spec : add ngram-mod (#19164 ) * spec : add ngram-mod * cont : simplify + keep track of occupancy * cont : cleanup * cont : move initialization to common/speculative * cont : cleanup * cont : cleanup * cont : fix	2026-01-30 18:21:48 +02:00
Andrew Marshall	84b0a98319	webui: Update Svelte to fix effect_update_depth_exceeded errors (#19144 ) The upstream fix is first available in 5.38.2, so constrain to at least that version. Rebuild pre-compiled webui index.html.gz based on these changes. See also: https://github.com/ggml-org/llama.cpp/issues/16347 https://github.com/huntabyte/bits-ui/issues/1687 https://github.com/sveltejs/svelte/issues/16548	2026-01-29 15:56:39 +01:00
Sascha Rogmann	72d3b1898a	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 ) * server: introduce self-speculative decoding * server: moved self-call into speculative.cpp * can_speculate() includes self-speculation Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: can_speculate() tests self-spec * server: replace can_speculate() with slot.can_speculate() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * common: use %zu format specifier for size_t in logging Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * server: can_speculate() requires a task instance * common: ngram map, config self-speculative decoding * common: add enum common_speculative_type * common: add vector of speculative states * common: add option --spec-draftless * server: cleanup (remove slot.batch_spec, rename) * common: moved self-spec impl to ngram-map * common: cleanup (use common_speculative_state_draft) * spec : refactor * cont : naming * spec: remove --spec-config * doc: (draftless) speculative decoding * common: print performance in spec decoding * minor : cleanup * common : better names * minor : cleanup + fix build * minor: comments * CODEOWNERS: add common/ngram-map.* (#18471) * common : rename speculative.draftless_type -> speculative.type * ngram-map : fix uninitialized values * ngram-map : take into account the input can become shorter * ngram-map : revert len check for now * arg : change `--spec-draftless` -> `--spec-type` * spec : add common_speculative_state::accept() * spec : refactor + add common_speculative_begin() * spec : fix begin() call with mtmd * spec : additional refactor + remove common_speculative_params --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-28 19:42:42 +02:00
Georgi Gerganov	b931f81b5a	server : adjust spec tests to generate up to 16 tokens (#19093 )	2026-01-28 09:11:40 +02:00
Daniel Bevenius	16639ba217	common : use two decimal places for float arg help messages (#19048 ) * common : use two decimal places for float arg help messages This commit updates the help messages for various command-line arguments in arg.cpp to display floating-point default values with two decimal places instead of one. The motivation for this changes is that currently only having one decimal place means that values generated using --help or llama-gen-docs will not display the correct values. For example, currently the value of top-p in tools/server/README.md is `0.9`, but the default value is actually '0.95'. And running llama-gen-docs does not update this value as it uses the output from the help message, which shows only one decimal place, so the values look like they are unchanged. * docs : run llama-gen-docs to update docs	2026-01-25 07:31:42 +01:00
Xuan-Son Nguyen	51fa458a92	server : support preserving reasoning_content in assistant message (#18994 ) * support reasoning_content input * report template caps to webui * add docs * rm commented code	2026-01-22 21:30:06 +01:00
Xuan-Son Nguyen	4e595b250a	server: do not log certain endpoints (avoid log spam) (#19028 )	2026-01-22 19:24:37 +01:00
손희준	c6926d1d95	server: Reorder methods in `server-task.cpp` (#19016 ) * Move `task_result_state::update_chat_msg` to match with header * Move `server_task_result_cmpl_partial::to_json_anthropic()` to match with header --------- Co-authored-by: openingnow <>	2026-01-22 14:36:04 +01:00
Hendrik Erz	3802d3c78f	fix: Use `tabular-nums` for chat message statistics (#18915 ) * fix: Use `tabular-nums` for chat message statistics * fix: Rebuild WebUI	2026-01-21 18:46:01 +01:00
손희준	fbbf3ad190	server: /v1/responses (partial) (#18486 ) * from previous PR * Make instruction(system) as first message * Convert [input_message] (text/image/file) * Rename convert_responses_to_chatcmpl(body) -> response_body * Initial tool call support * Erase instructions field from chatcmpl body * Feed reasoning texts to chat template * Use std::vector instead of opaque json array * Make output_item.added events consistent * Move `server_task_result_cmpl_partial::update` from header to source * Match ID of output_item.added and .done events * Add function_call only if there is no "fc_" prefix * Add function call output at non-streaming API * Test if ID is persistent * Add doc * Fix style - use trailing comma * Rewrite state management * catch up with upstream/master * Fix style - "type" is the first item of SSE data * Explicitly check "instructions" from response_body * Make lambdas static * Check if reasoning content exists * Add `oai_resp_id` to task_result_state(also initialized at ctor), server_task_result_cmpl_partial, and server_task_result_cmpl_final * Reject `input_file` since it is not supported by chatcmpl * Add "fc_" prefix to non-straming function call id as coderabbit pointed out --------- Co-authored-by: openingnow <>	2026-01-21 17:47:23 +01:00
Adrien Gallouët	1c7cf94b22	common, server : use the same User-Agent by default (#18957 ) This commit also ensures that if a custom User-Agent is used, it will be the only one sent. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-20 18:28:43 +01:00
Xuan-Son Nguyen	2c1f199653	cli : fix reasoning responses in CLI (#18961 ) * cli : fix reasoning responses in CLI * fix build * fix build (2)	2026-01-20 18:23:25 +01:00
Xuan-Son Nguyen	6df686bee6	server : refactor oai_parser_opt, move it to server_chat_params (#18937 ) * server_chat_params * move chat format into CLI * use meta whenever possible * clean up, no more chatml fallback	2026-01-19 23:28:01 +01:00
Lennart Austenfeld	18361c579c	server: fix memory reservations in populate_token_probs (#18787 )	2026-01-19 19:13:31 +01:00

1 2 3 4 5 ...

563 Commits