llama.cpp

Commit Graph

Author	SHA1	Message	Date
Pascal	20e5e70c61	chore: update webui build output	2026-02-13 13:21:35 +01:00
Pascal	a2cce59d69	fix: acurate tool_response display	2026-02-13 13:21:35 +01:00
Pascal	fdd67f45e6	fix: unify MCP server label logic with simplified fallback	2026-02-13 13:21:35 +01:00
Pascal	bdd9bcfb75	chore: update webui build output	2026-02-13 13:21:35 +01:00
Pascal	a515179730	refactor: remove multimodal validation from model selector Remove all frontend validation logic that prevented users from selecting models based on multimodal capabilities. This refactoring removes restrictive UI code while maintaining full functionality - Vision models can describe images as text - That text remains useful for non-vision models - Chaining vision -> non-vision is a valid workflow - Users know their use case better than the UI - Users can return to vision models when needed	2026-02-13 13:21:35 +01:00
Pascal	c7e76c65d1	chore: update webui build output	2026-02-13 13:21:35 +01:00
Pascal	37c084873c	fix: ignore assistant attachments (MCP) for modality detection	2026-02-13 13:21:35 +01:00
Pascal	d09cdfaf0a	chore: update webui build output	2026-02-13 13:21:35 +01:00
Pascal	6d41f74031	refactor: eliminate MCP circular dependency - Change architecture from mcpStore <-> mcpClient to mcpClient -> mcpStore - Remove bidirectional callback pattern (setCallback, notify methods) - Add updateState/updateHealthCheck public methods in mcpStore - Replace callback calls with direct mcpStore method calls - Remove unused imports (browser, HealthCheckState) and constructor - Fixes CI: ReferenceError Cannot access mcpClient before initialization	2026-02-13 13:21:35 +01:00
Pascal	07ae189175	chore: update webui build output	2026-02-13 13:21:34 +01:00
Pascal	23741b3c6a	fix: strip reasoning content and UI proprietary tags from prompts TODO: add toggle and ensure backend API compliance for reasoning format	2026-02-13 13:21:34 +01:00
Pascal	b5b527fa52	chore: update webui build output	2026-02-13 13:21:34 +01:00
Pascal	fb1ec29898	refactor: remove reasoning after first turn filter	2026-02-13 13:21:34 +01:00
Pascal	fc5d9f587f	refactor: inline reasoning with tags, remove fixed thinking field	2026-02-13 13:21:34 +01:00
Pascal	6b3bc23fc2	chore: update webui build output	2026-02-13 13:21:34 +01:00
Pascal	c73baed7e3	feat: resolve MCP attachment images via rehype plugin LLM can reference tool-generated images using markdown links like, plugin resolves attachment names to base64 from message.extra when present, regular HTTP/data URLs pass through unchanged (no regression) - rehypeResolveAttachmentImages plugin in markdown pipeline - Pass message prop to MarkdownContent and AgenticContent - Force processor reactivity on message.extra changes - Filter assistant images from API context (display-only)	2026-02-13 13:21:34 +01:00
Pascal	09381a59fd	feat: persist base64 attachments from tool results	2026-02-13 13:21:34 +01:00
Pascal	f16457551e	webui: fix custom headers persistence in UI (derived)	2026-02-13 13:21:34 +01:00
Pascal	f42e5f114e	webui: fix custom headers persistence in UI	2026-02-13 13:21:34 +01:00
Aleksander Grygier	162bd976ed	fix: Word wrapping	2026-02-13 13:21:34 +01:00
Aleksander Grygier	c2dd1d2fed	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	008463149b	feat: UI improvements	2026-02-13 13:21:34 +01:00
Aleksander Grygier	1dba2ec4a9	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	805c171825	feat: UI improvement	2026-02-13 13:21:34 +01:00
Aleksander Grygier	d6455a7530	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	bb4bd7fe09	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	05dfb5e70c	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	cad9ca1208	feat: MCP Server Details	2026-02-13 13:21:34 +01:00
Aleksander Grygier	0e980bf881	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	825d2ea9a9	feat: MCP connection details WIP	2026-02-13 13:21:34 +01:00
Aleksander Grygier	2b37f70c37	refactor: MCP types and health check	2026-02-13 13:21:34 +01:00
Aleksander Grygier	36a37d1794	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	38ba6d8372	refactor: KeyValuePairs component	2026-02-13 13:21:34 +01:00
Aleksander Grygier	c5465d4893	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	57089370e4	refactor: DRY	2026-02-13 13:21:34 +01:00
Aleksander Grygier	f80d5f615e	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	e1da51335c	refactor: Architecture improvements	2026-02-13 13:21:34 +01:00
Aleksander Grygier	3bc8d93546	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	48b2b1b2f0	refactor: MCP state management + stores/clients relationship	2026-02-13 13:21:34 +01:00
Aleksander Grygier	2cd682178b	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	da8baaa9b8	fix: Distinguish streaming vs incomplete tool calls in UI	2026-02-13 13:21:34 +01:00
Aleksander Grygier	3179858e5f	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	9471729162	fix: Restore live reactive UI progress for tool calls	2026-02-13 13:21:34 +01:00
Aleksander Grygier	64923b20be	chore: update webui build output	2026-02-13 13:21:34 +01:00
Pascal	179477b4ed	fix: reset tool call state between turns	2026-02-13 13:21:34 +01:00
Pascal	38244a1bfa	webui: enable streaming of tool call arguments	2026-02-13 13:21:34 +01:00
Aleksander Grygier	2faf237d01	chore: update webui build output	2026-02-13 13:21:34 +01:00
Aleksander Grygier	5ffb6aba3a	refactor: Cleanup	2026-02-13 13:21:34 +01:00
Pascal	96e51e2a41	webui: prevent mobile dropdown immediate close on synthetic click	2026-02-13 13:20:42 +01:00
Pascal	8916698294	webui: fix redirect to root ignoring base path	2026-02-13 13:20:42 +01:00
Aleksander Grygier	2a33fc2059	refactor: Cleanup	2026-02-13 13:20:41 +01:00
Aleksander Grygier	04913f20d9	chore: update webui build output	2026-02-13 13:20:41 +01:00
Aleksander Grygier	939e7aa16b	refactor: Types	2026-02-13 13:20:41 +01:00
Aleksander Grygier	bef865d871	refactor: Componentize McpServerCard	2026-02-13 13:20:41 +01:00
Aleksander Grygier	7dbb05a160	refactor: Cleanup	2026-02-13 13:20:41 +01:00
Aleksander Grygier	7e194f653a	fix: Remove redundant CSS class	2026-02-13 13:20:41 +01:00
Aleksander Grygier	02c87fa3c9	feat: Add TruncatedText component	2026-02-13 13:20:41 +01:00
Aleksander Grygier	27b80ae3e8	fix: Collapsible box trigger	2026-02-13 13:20:26 +01:00
Aleksander Grygier	408e098324	refactor: Cleanup	2026-02-13 13:20:26 +01:00
Aleksander Grygier	0b36d04c38	refactor: Cleanup	2026-02-13 13:20:07 +01:00
Aleksander Grygier	df464c1f5a	refactor: Collapsible Content Block & small fixes	2026-02-13 13:18:20 +01:00
Aleksander Grygier	26044454ef	chore: update webui build output	2026-02-13 13:18:20 +01:00
Aleksander Grygier	f0ac6fa039	refactor: Cleanup	2026-02-13 13:18:20 +01:00
Aleksander Grygier	7c9ba36216	chore: update webui build output	2026-02-13 13:18:20 +01:00
Aleksander Grygier	7ab269cd77	feat: UI improvements	2026-02-13 13:18:20 +01:00
Aleksander Grygier	e0122465ed	feat: Always show Mcp Selector	2026-02-13 13:18:20 +01:00
Pascal	36c9ad9303	fix: remove double scrollbar in model selector by using Bits UI content available height	2026-02-13 13:18:20 +01:00
Aleksander Grygier	bc60beb1a7	feat: Enable adding System Prompt per-chat	2026-02-13 13:18:20 +01:00
Aleksander Grygier	276a3e9416	fix: UI	2026-02-13 13:17:51 +01:00
Aleksander Grygier	c74065de75	chore: update webui build output	2026-02-13 13:17:51 +01:00
Aleksander Grygier	e6ad864984	feat: UI improvements	2026-02-13 13:17:51 +01:00
Pascal	cff237cb3e	webui: raw tool result display, strip only leading/trailing newlines to preserve indentation	2026-02-13 13:17:33 +01:00
Pascal	afb79b2970	webui: split raw output into backend parsing and frontend display options	2026-02-13 13:17:33 +01:00
Pascal	18efdabb12	webui: remove legacy wrapper and restore WebSocket transport	2026-02-13 13:17:33 +01:00
Pascal	a13782a4d1	webui: remove unused imports	2026-02-13 13:17:33 +01:00
Aleksander Grygier	d548bf27dd	chore: update webui build output	2026-02-13 13:17:33 +01:00
Aleksander Grygier	bdd5958f6d	feat: Improve agentic tool call streaming display with 'in progress' state	2026-02-13 13:17:32 +01:00
Aleksander Grygier	a9c2ea7a8e	feat: Enhance MCP server dropdown with search, popularity sorting, and per-chat overrides	2026-02-13 13:17:32 +01:00
Aleksander Grygier	dfce09b34b	feat: Add per-chat MCP server overrides	2026-02-13 13:17:32 +01:00
Aleksander Grygier	54374edecd	chore: update webui build output	2026-02-13 13:17:32 +01:00
Aleksander Grygier	b763a4cc69	feat: Add image load error fallback in MarkdownContent	2026-02-13 13:17:32 +01:00
Aleksander Grygier	af9a76b6dc	feat: Implement lazy MCP client shutdown	2026-02-13 13:17:32 +01:00
Aleksander Grygier	c7870a3903	feat: Enhance tool call streaming UI and output format	2026-02-13 13:17:32 +01:00
Aleksander Grygier	fb5e464fe7	feat: Display and manage servers in ChatForm actions	2026-02-13 13:17:32 +01:00
Aleksander Grygier	dc7a3f33ba	feat: Integrate server management dialog into chat settings	2026-02-13 13:03:15 +01:00
Aleksander Grygier	0b13c95519	feat: Implement dedicated server management UI components	2026-02-13 13:03:15 +01:00
Aleksander Grygier	8df7e4a54f	refactor: Centralize health check logic in store	2026-02-13 13:03:15 +01:00
Aleksander Grygier	9a8cae462e	feat: Enhance server config with headers and schema normalization	2026-02-13 13:03:15 +01:00
Aleksander Grygier	bc2d879dea	feat: Add McpLogo Svelte component	2026-02-13 13:03:15 +01:00
Aleksander Grygier	42d52605d9	refactor: Consolidate UI CSS classes into shared module	2026-02-13 13:03:15 +01:00
Aleksander Grygier	6c95020b06	chore: update webui build output	2026-02-13 12:57:23 +01:00
Aleksander Grygier	62dbc9f654	feat: Raw LLM output switch per message	2026-02-13 12:57:23 +01:00
Aleksander Grygier	284425097b	refactor: Tool call handling	2026-02-13 12:57:03 +01:00
Aleksander Grygier	5beeb88a37	docs: Update high-level architecture diagrams for MCP integration	2026-02-13 12:55:42 +01:00
Aleksander Grygier	acdd30e3af	feat: Add AgenticContent component for enhanced tool call rendering	2026-02-13 12:55:42 +01:00
Aleksander Grygier	49a8c8b148	refactor: Update ChatStore to leverage mcpStore for agentic flow	2026-02-13 12:55:42 +01:00
Aleksander Grygier	5b582beb75	feat: Implement agentic orchestration within ChatService	2026-02-13 12:55:03 +01:00
Aleksander Grygier	391479edb2	feat: Introduce reactive mcpStore for client lifecycle management	2026-02-13 12:55:03 +01:00
Aleksander Grygier	7e184c174d	feat: Refactor MCP client to use official SDK	2026-02-13 12:55:03 +01:00
Aleksander Grygier	1a041a5b9b	feat: Add @modelcontextprotocol/sdk and zod dependencies	2026-02-13 12:55:03 +01:00
Aleksander Grygier	2325d2a50d	refactor: Update Agentic and MCP config parsing to use new utils and constants	2026-02-13 12:55:03 +01:00
Aleksander Grygier	0c24db3178	feat: Centralize MCP and Agentic type definitions and constants	2026-02-13 12:55:02 +01:00
Aleksander Grygier	26a19183b7	feat: Introduce common utility functions	2026-02-13 12:55:02 +01:00
Pascal	14f6728ef1	webui: use normalizedMessages after upstream refactor	2026-02-13 12:55:02 +01:00
Pascal	cb99ed9f71	webui: MCP client with low coupling to current codebase	2026-02-13 12:55:02 +01:00
Sigbjørn Skjæret	b2ecc0cdb4	support --verbose-prompt (#19576 )	2026-02-13 12:49:10 +01:00
Aleksander Grygier	5174d7206f	webui: UI and routing fixes (#19586 ) * chore: update webui build output * chore: update webui build output * fix: Scroll issues in DropdownMenuSearchable * webui: fix redirect to root ignoring base path * fix: Word wrapping * fix: remove obsolete modality UI tests causing CI failures - Remove VisionModality/AudioModality test stories - Remove mockServerProps usage and imports - Simplify Default test (remove dropdown interaction checks) - Simplify FileAttachments test (remove mocks) * feat: Improve formatting performance time --------- Co-authored-by: Pascal <admin@serveurperso.com>	2026-02-13 12:31:00 +01:00
Aleksander Grygier	4c61875bf8	webui: Add switcher to Chat Message UI to show raw LLM output (#19571 )	2026-02-12 19:55:51 +01:00
Aleksander Grygier	4d688f9ebb	(webui) FEATURE: Enable adding or injecting System Message into chat (#19556 ) * feat: Enable adding System Prompt per-chat * fix: Save draft message in Chat Form when adding System Prompt from new chat view * fix: Proper system message deletion logic * chore: Formatting * chore: update webui build output	2026-02-12 13:56:08 +01:00
Aleksander Grygier	f486ce9f30	(webui) REFACTOR: UI primitives and polish (#19551 ) * webui: UI primitives and polish (non-MCP) * chore: update webui build output	2026-02-12 12:21:00 +01:00
Aleksander Grygier	38adc7d469	WebUI Architecture Cleanup (#19541 ) * webui: architecture foundation (non-MCP core refactors) * chore: update webui build output	2026-02-12 11:22:27 +01:00
RichardScottOZ	fa16e517a3	server : fix typo in README.md for features list (#19510 ) extra l for full	2026-02-12 08:56:25 +01:00
AesSedai	e463bbdf65	model: Add Kimi-K2.5 support (#19170 ) * Move dequant_model to after the text_config merge Add new kimi-k2.5 keys to mtmd convert Update V_MMPROJ tensor mapping for new mm_projector.proj keys Update V_M_IMP_NORM for new mm_projector.pre_norm key * Fix a couple of oversights * Add image support for Kimi-K2.5 * Revert changes to KimiVLForConditionalGeneration * Fix an assert crash * Fix permute swapping w / h on accident * Kimi-K2.5: Use merged QKV for vision * Kimi-K2.5: pre-convert vision QK to use build_rope_2d * Kimi-K2.5: support non-interleaved rope for vision * Kimi-K2.5: fix min / max pixel * Kimi-K2.5: remove v/o permutes, unnecessary * Kimi-K2.5: update permute name to match * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Kimi-K2.5: replace build_rope_2d ggml_cont with ggml_view_3d pointers --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-11 16:47:30 +01:00
Georgi Gerganov	6d95707827	model : fix wavtokenizer embedding notions (#19479 )	2026-02-11 07:52:20 +02:00
JJJYmmm	fc0fe40049	models : support qwen3.5 series (#19468 ) * support qwen3.5 series * remove deepstack for now, and some code clean * code clean * add FULL_ATTENTION_INTERVAL metadata * code clean * reorder v heads for linear attention to avoid expensive interleaved repeat	2026-02-10 18:00:26 +02:00
Daniel Bevenius	66d403c480	tts : fix typos in README.md [no ci] (#19463 )	2026-02-10 07:30:41 +01:00
Tarek Dakhran	262364e31d	mtmd: Implement tiling for LFM2-VL (#19454 )	2026-02-09 17:30:32 +01:00
손희준	820ebfa6f4	Server: log when converting requests to chat completions format (#19457 ) * Log converting requests * Print as debug instead of info [no ci] --------- Co-authored-by: openingnow <>	2026-02-09 16:22:57 +01:00
Sascha Rogmann	292f6908cd	spec : remove check rate (#19377 ) * spec: remove parameter spec-ngram-check-rate * spec : renamed statistics vars * spec : add n_call_begin, n_call_accept * spec : don't enable key-map-stats	2026-02-09 15:30:50 +02:00
Adrien Gallouët	5fa1c190d9	rpc : update from common.cpp (#19400 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-08 09:06:45 +01:00
Georgi Gerganov	eb449cdfa4	server : improve context checkpoint logic (#19408 )	2026-02-08 09:40:04 +02:00
ddh0	5999b50eb0	llama-quantize : cleanup `--help` output (#19317 ) * cleanup `llama-quantize --help` output some much needed TLC * remove future argument oops, spoiler * cleanup of cleanup	2026-02-08 09:22:38 +02:00
Georgi Gerganov	dfde5993ea	common : add common_speculative_is_compat() (#19270 ) * llama : add llama_memory_can_rm_suffix() * Revert "llama : add llama_memory_can_rm_suffix()" This reverts commit `d30e59b62a`. * spec : check if the target context is compatible for spec decoding	2026-02-06 16:47:22 +02:00
Daniel Bevenius	25f40ca65f	completion : simplify batch (embd) processing (#19286 ) * completion : simplify batch (embd) processing This commit simplifies the processing of embd by removing the for loop that currently exists which uses params.n_batch as its increment. This commit also removes the clamping of n_eval as the size of embd is always at most the size of params.n_batch. The motivation is to clarify the code as it is currently a little confusing when looking at this for loop in isolation and thinking that it can process multiple batches. * add an assert to verify n_eval is not greater than n_batch	2026-02-04 05:43:28 +01:00
Xuan-Son Nguyen	07a7412a3b	mtmd: add min/max pixels gguf metadata (#19273 )	2026-02-02 20:59:06 +01:00
Matthieu Coudron	a3fa035822	server: print actual model name in 'model not found" error (#19117 ) Experimenting with AI, my environment gets messy fast and it's not always easy to know what model my software is trying to load. This helps with troubleshooting. before: Error: { code = 400, message = "model not found", type = "invalid_request_error" } After: Error: { code = 400, message = "model 'toto' not found", type = "invalid_request_error" }	2026-02-02 16:55:27 +01:00
Christian Kastner	7a4ca3cbd9	docs : Minor cleanups (#19252 ) * Update old URLs to github.com/ggml-org/ * Bump copyrights	2026-02-02 08:38:55 +02:00
EugeoSynthesisThirtyTwo	3dd95914d0	quantize: add option --tensor-type-file to llama-quantize (#18572 ) * add option --tensor-type-file to llama-quantize, but it raises an error. * add error message when file not found * quantize: update help menu, fix CI Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>	2026-01-31 11:39:21 +08:00
tc-mb	ec6c7421e4	mtmd: support MiniCPM-o 4.5(vision only) (#19211 ) Signed-off-by: tc-mb <caitianchi@modelbest.cn>	2026-01-30 23:19:30 +01:00
Georgi Gerganov	bbada8bfb9	server : wrap around the "id_slot" parameter (#19207 ) * server : wrap around the "id_slot" parameter * cont : minor	2026-01-30 19:46:10 +02:00
Georgi Gerganov	dabaa2e77a	spec : add ngram-mod (#19164 ) * spec : add ngram-mod * cont : simplify + keep track of occupancy * cont : cleanup * cont : move initialization to common/speculative * cont : cleanup * cont : cleanup * cont : fix	2026-01-30 18:21:48 +02:00
Andrew Marshall	84b0a98319	webui: Update Svelte to fix effect_update_depth_exceeded errors (#19144 ) The upstream fix is first available in 5.38.2, so constrain to at least that version. Rebuild pre-compiled webui index.html.gz based on these changes. See also: https://github.com/ggml-org/llama.cpp/issues/16347 https://github.com/huntabyte/bits-ui/issues/1687 https://github.com/sveltejs/svelte/issues/16548	2026-01-29 15:56:39 +01:00
Sascha Rogmann	72d3b1898a	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 ) * server: introduce self-speculative decoding * server: moved self-call into speculative.cpp * can_speculate() includes self-speculation Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: can_speculate() tests self-spec * server: replace can_speculate() with slot.can_speculate() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * common: use %zu format specifier for size_t in logging Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * server: can_speculate() requires a task instance * common: ngram map, config self-speculative decoding * common: add enum common_speculative_type * common: add vector of speculative states * common: add option --spec-draftless * server: cleanup (remove slot.batch_spec, rename) * common: moved self-spec impl to ngram-map * common: cleanup (use common_speculative_state_draft) * spec : refactor * cont : naming * spec: remove --spec-config * doc: (draftless) speculative decoding * common: print performance in spec decoding * minor : cleanup * common : better names * minor : cleanup + fix build * minor: comments * CODEOWNERS: add common/ngram-map.* (#18471) * common : rename speculative.draftless_type -> speculative.type * ngram-map : fix uninitialized values * ngram-map : take into account the input can become shorter * ngram-map : revert len check for now * arg : change `--spec-draftless` -> `--spec-type` * spec : add common_speculative_state::accept() * spec : refactor + add common_speculative_begin() * spec : fix begin() call with mtmd * spec : additional refactor + remove common_speculative_params --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-28 19:42:42 +02:00
Georgi Gerganov	b931f81b5a	server : adjust spec tests to generate up to 16 tokens (#19093 )	2026-01-28 09:11:40 +02:00
Georgi Gerganov	080b161995	completion : fix prompt cache for recurrent models (#19045 )	2026-01-25 09:12:50 +02:00
Daniel Bevenius	16639ba217	common : use two decimal places for float arg help messages (#19048 ) * common : use two decimal places for float arg help messages This commit updates the help messages for various command-line arguments in arg.cpp to display floating-point default values with two decimal places instead of one. The motivation for this changes is that currently only having one decimal place means that values generated using --help or llama-gen-docs will not display the correct values. For example, currently the value of top-p in tools/server/README.md is `0.9`, but the default value is actually '0.95'. And running llama-gen-docs does not update this value as it uses the output from the help message, which shows only one decimal place, so the values look like they are unchanged. * docs : run llama-gen-docs to update docs	2026-01-25 07:31:42 +01:00
Johannes Gäßler	e9fd8dcab4	llama-fit-params: keep explicit --ctx-size 0 (#19070 )	2026-01-24 22:13:08 +01:00
Aldehir Rojas	a3e812811d	cli : load parser definition (#19031 ) * cli : load parser definition * cont : only unload if a parser is defined	2026-01-22 20:31:22 -06:00
Xuan-Son Nguyen	51fa458a92	server : support preserving reasoning_content in assistant message (#18994 ) * support reasoning_content input * report template caps to webui * add docs * rm commented code	2026-01-22 21:30:06 +01:00
Xuan-Son Nguyen	4e595b250a	server: do not log certain endpoints (avoid log spam) (#19028 )	2026-01-22 19:24:37 +01:00
Xuan-Son Nguyen	9eb5bfec1a	mtmd : update docs to use llama_model_n_embd_inp (#18999 )	2026-01-22 14:36:32 +01:00
손희준	c6926d1d95	server: Reorder methods in `server-task.cpp` (#19016 ) * Move `task_result_state::update_chat_msg` to match with header * Move `server_task_result_cmpl_partial::to_json_anthropic()` to match with header --------- Co-authored-by: openingnow <>	2026-01-22 14:36:04 +01:00
Hendrik Erz	3802d3c78f	fix: Use `tabular-nums` for chat message statistics (#18915 ) * fix: Use `tabular-nums` for chat message statistics * fix: Rebuild WebUI	2026-01-21 18:46:01 +01:00
손희준	fbbf3ad190	server: /v1/responses (partial) (#18486 ) * from previous PR * Make instruction(system) as first message * Convert [input_message] (text/image/file) * Rename convert_responses_to_chatcmpl(body) -> response_body * Initial tool call support * Erase instructions field from chatcmpl body * Feed reasoning texts to chat template * Use std::vector instead of opaque json array * Make output_item.added events consistent * Move `server_task_result_cmpl_partial::update` from header to source * Match ID of output_item.added and .done events * Add function_call only if there is no "fc_" prefix * Add function call output at non-streaming API * Test if ID is persistent * Add doc * Fix style - use trailing comma * Rewrite state management * catch up with upstream/master * Fix style - "type" is the first item of SSE data * Explicitly check "instructions" from response_body * Make lambdas static * Check if reasoning content exists * Add `oai_resp_id` to task_result_state(also initialized at ctor), server_task_result_cmpl_partial, and server_task_result_cmpl_final * Reject `input_file` since it is not supported by chatcmpl * Add "fc_" prefix to non-straming function call id as coderabbit pointed out --------- Co-authored-by: openingnow <>	2026-01-21 17:47:23 +01:00
Adrien Gallouët	1c7cf94b22	common, server : use the same User-Agent by default (#18957 ) This commit also ensures that if a custom User-Agent is used, it will be the only one sent. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-20 18:28:43 +01:00
Xuan-Son Nguyen	2c1f199653	cli : fix reasoning responses in CLI (#18961 ) * cli : fix reasoning responses in CLI * fix build * fix build (2)	2026-01-20 18:23:25 +01:00
Xuan-Son Nguyen	6df686bee6	server : refactor oai_parser_opt, move it to server_chat_params (#18937 ) * server_chat_params * move chat format into CLI * use meta whenever possible * clean up, no more chatml fallback	2026-01-19 23:28:01 +01:00
Lennart Austenfeld	18361c579c	server: fix memory reservations in populate_token_probs (#18787 )	2026-01-19 19:13:31 +01:00
Tarek Dakhran	c945aaaef2	mtmd : Fix ASR for LFM2.5-Audio-1.5B (#18876 )	2026-01-16 11:23:08 +01:00
Xuan-Son Nguyen	c15395f73c	common : implement new jinja template engine (#18462 ) * jinja vm * lexer * add vm types * demo * clean up * parser ok * binary_expression::execute * shadow naming * bin ops works! * fix map object * add string builtins * add more builtins * wip * use mk_val * eval with is_user_input * render gemma tmpl ok * track input string even after transformations * support binded functions * keyword arguments and slicing array * use shared_ptr for values * add mk_stmt * allow print source on exception * fix negate test * testing more templates * mostly works * add filter_statement * allow func to access ctx * add jinja-value.cpp * impl global_from_json * a lot of fixes * more tests * more fix, more tests * more fixes * rm workarounds * demo: type inferrence * add placeholder for tojson * improve function args handling * rm type inference * no more std::regex * trailing spaces * make testing more flexible * make output a bit cleaner * (wip) redirect minja calls * test: add --output * fix crash on macro kwargs * add minimal caps system * add some workarounds * rm caps_apply_workarounds * get rid of preprocessing * more fixes * fix test-chat-template * move test-chat-jinja into test-chat-template * rm test-chat-jinja from cmake * test-chat-template: use common * fix build * fix build (2) * rename vm --> interpreter * improve error reporting * correct lstrip behavior * add tojson * more fixes * disable tests for COMMON_CHAT_FORMAT_GENERIC * make sure tojson output correct order * add object.length * fully functional selectattr / rejectattr * improve error reporting * more builtins added, more fixes * create jinja rendering tests * fix testing.h path * adjust whitespace rules * more fixes * temporary disable test for ibm-granite * r/lstrip behavior matched with hf.js * minimax, glm4.5 ok * add append and pop * kimi-k2 ok * test-chat passed * fix lstrip_block * add more jinja tests * cast to unsigned char * allow dict key to be numeric * nemotron: rm windows newline * tests ok * fix test * rename interpreter --> runtime * fix build * add more checks * bring back generic format support * fix Apertus * [json.exception.out_of_range.403] key 'content' not found * rm generic test * refactor input marking * add docs * fix windows build * clarify error message * improved tests * split/rsplit with maxsplit * non-inverse maxsplit forgot to change after simplifying * implement separators for tojson and fix indent * i like to move it move it * rename null -- > none * token::eof * some nits + comments * add exception classes for lexer and parser * null -> none * rename global -> env * rm minja * update docs * docs: add input marking caveats * imlement missing jinja-tests functions * oops * support trim filter with args, remove bogus to_json reference * numerous argument fixes * updated tests * implement optional strip chars parameter * use new chars parameter * float filter also has default * always leave at least one decimal in float string * jinja : static analysis + header cleanup + minor fixes * add fuzz test * add string.cpp * fix chat_template_kwargs * nits * fix build * revert * unrevert sorry :) * add fuzz func_args, refactor to be safer * fix array.map() * loosen ensure_vals max count condition, add not impl for map(int) * hopefully fix windows * check if empty first * normalize newlines --------- Co-authored-by: Alde Rojas <hello@alde.dev> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-01-16 11:22:06 +01:00

1 2 3 4 5 ...

735 Commits