Commit Graph

555 Commits

Author SHA1 Message Date
Aleksander Grygier b763a4cc69 feat: Add image load error fallback in MarkdownContent 2026-02-13 13:17:32 +01:00
Aleksander Grygier af9a76b6dc feat: Implement lazy MCP client shutdown 2026-02-13 13:17:32 +01:00
Aleksander Grygier c7870a3903 feat: Enhance tool call streaming UI and output format 2026-02-13 13:17:32 +01:00
Aleksander Grygier fb5e464fe7 feat: Display and manage servers in ChatForm actions 2026-02-13 13:17:32 +01:00
Aleksander Grygier dc7a3f33ba feat: Integrate server management dialog into chat settings 2026-02-13 13:03:15 +01:00
Aleksander Grygier 0b13c95519 feat: Implement dedicated server management UI components 2026-02-13 13:03:15 +01:00
Aleksander Grygier 8df7e4a54f refactor: Centralize health check logic in store 2026-02-13 13:03:15 +01:00
Aleksander Grygier 9a8cae462e feat: Enhance server config with headers and schema normalization 2026-02-13 13:03:15 +01:00
Aleksander Grygier bc2d879dea feat: Add McpLogo Svelte component 2026-02-13 13:03:15 +01:00
Aleksander Grygier 42d52605d9 refactor: Consolidate UI CSS classes into shared module 2026-02-13 13:03:15 +01:00
Aleksander Grygier 6c95020b06 chore: update webui build output 2026-02-13 12:57:23 +01:00
Aleksander Grygier 62dbc9f654 feat: Raw LLM output switch per message 2026-02-13 12:57:23 +01:00
Aleksander Grygier 284425097b refactor: Tool call handling 2026-02-13 12:57:03 +01:00
Aleksander Grygier 5beeb88a37 docs: Update high-level architecture diagrams for MCP integration 2026-02-13 12:55:42 +01:00
Aleksander Grygier acdd30e3af feat: Add AgenticContent component for enhanced tool call rendering 2026-02-13 12:55:42 +01:00
Aleksander Grygier 49a8c8b148 refactor: Update ChatStore to leverage mcpStore for agentic flow 2026-02-13 12:55:42 +01:00
Aleksander Grygier 5b582beb75 feat: Implement agentic orchestration within ChatService 2026-02-13 12:55:03 +01:00
Aleksander Grygier 391479edb2 feat: Introduce reactive mcpStore for client lifecycle management 2026-02-13 12:55:03 +01:00
Aleksander Grygier 7e184c174d feat: Refactor MCP client to use official SDK 2026-02-13 12:55:03 +01:00
Aleksander Grygier 1a041a5b9b feat: Add @modelcontextprotocol/sdk and zod dependencies 2026-02-13 12:55:03 +01:00
Aleksander Grygier 2325d2a50d refactor: Update Agentic and MCP config parsing to use new utils and constants 2026-02-13 12:55:03 +01:00
Aleksander Grygier 0c24db3178 feat: Centralize MCP and Agentic type definitions and constants 2026-02-13 12:55:02 +01:00
Aleksander Grygier 26a19183b7 feat: Introduce common utility functions 2026-02-13 12:55:02 +01:00
Pascal 14f6728ef1 webui: use normalizedMessages after upstream refactor 2026-02-13 12:55:02 +01:00
Pascal cb99ed9f71 webui: MCP client with low coupling to current codebase 2026-02-13 12:55:02 +01:00
Sigbjørn Skjæret b2ecc0cdb4
support --verbose-prompt (#19576) 2026-02-13 12:49:10 +01:00
Aleksander Grygier 5174d7206f
webui: UI and routing fixes (#19586)
* chore: update webui build output

* chore: update webui build output

* fix: Scroll issues in DropdownMenuSearchable

* webui: fix redirect to root ignoring base path

* fix: Word wrapping

* fix: remove obsolete modality UI tests causing CI failures

- Remove VisionModality/AudioModality test stories
- Remove mockServerProps usage and imports
- Simplify Default test (remove dropdown interaction checks)
- Simplify FileAttachments test (remove mocks)

* feat: Improve formatting performance time

---------

Co-authored-by: Pascal <admin@serveurperso.com>
2026-02-13 12:31:00 +01:00
Aleksander Grygier 4c61875bf8
webui: Add switcher to Chat Message UI to show raw LLM output (#19571) 2026-02-12 19:55:51 +01:00
Aleksander Grygier 4d688f9ebb
(webui) FEATURE: Enable adding or injecting System Message into chat (#19556)
* feat: Enable adding System Prompt per-chat

* fix: Save draft message in Chat Form when adding System Prompt from new chat view

* fix: Proper system message deletion logic

* chore: Formatting

* chore: update webui build output
2026-02-12 13:56:08 +01:00
Aleksander Grygier f486ce9f30
(webui) REFACTOR: UI primitives and polish (#19551)
* webui: UI primitives and polish (non-MCP)

* chore: update webui build output
2026-02-12 12:21:00 +01:00
Aleksander Grygier 38adc7d469
WebUI Architecture Cleanup (#19541)
* webui: architecture foundation (non-MCP core refactors)

* chore: update webui build output
2026-02-12 11:22:27 +01:00
RichardScottOZ fa16e517a3
server : fix typo in README.md for features list (#19510)
extra l for full
2026-02-12 08:56:25 +01:00
AesSedai e463bbdf65
model: Add Kimi-K2.5 support (#19170)
* Move dequant_model to after the text_config merge
Add new kimi-k2.5 keys to mtmd convert
Update V_MMPROJ tensor mapping for new mm_projector.proj keys
Update V_M_IMP_NORM for new mm_projector.pre_norm key

* Fix a couple of oversights

* Add image support for Kimi-K2.5

* Revert changes to KimiVLForConditionalGeneration

* Fix an assert crash

* Fix permute swapping w / h on accident

* Kimi-K2.5: Use merged QKV for vision

* Kimi-K2.5: pre-convert vision QK to use build_rope_2d

* Kimi-K2.5: support non-interleaved rope for vision

* Kimi-K2.5: fix min / max pixel

* Kimi-K2.5: remove v/o permutes, unnecessary

* Kimi-K2.5: update permute name to match

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Kimi-K2.5: replace build_rope_2d ggml_cont with ggml_view_3d pointers

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-11 16:47:30 +01:00
Georgi Gerganov 6d95707827
model : fix wavtokenizer embedding notions (#19479) 2026-02-11 07:52:20 +02:00
JJJYmmm fc0fe40049
models : support qwen3.5 series (#19468)
* support qwen3.5 series

* remove deepstack for now, and some code clean

* code clean

* add FULL_ATTENTION_INTERVAL metadata

* code clean

* reorder v heads for linear attention to avoid expensive interleaved repeat
2026-02-10 18:00:26 +02:00
Daniel Bevenius 66d403c480
tts : fix typos in README.md [no ci] (#19463) 2026-02-10 07:30:41 +01:00
Tarek Dakhran 262364e31d
mtmd: Implement tiling for LFM2-VL (#19454) 2026-02-09 17:30:32 +01:00
손희준 820ebfa6f4
Server: log when converting requests to chat completions format (#19457)
* Log converting requests

* Print as debug instead of info [no ci]

---------

Co-authored-by: openingnow <>
2026-02-09 16:22:57 +01:00
Sascha Rogmann 292f6908cd
spec : remove check rate (#19377)
* spec: remove parameter spec-ngram-check-rate

* spec : renamed statistics vars

* spec : add n_call_begin, n_call_accept

* spec : don't enable key-map-stats
2026-02-09 15:30:50 +02:00
Adrien Gallouët 5fa1c190d9
rpc : update from common.cpp (#19400)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-08 09:06:45 +01:00
Georgi Gerganov eb449cdfa4
server : improve context checkpoint logic (#19408) 2026-02-08 09:40:04 +02:00
ddh0 5999b50eb0
llama-quantize : cleanup `--help` output (#19317)
* cleanup `llama-quantize --help` output

some much needed TLC

* remove future argument

oops, spoiler

* cleanup of cleanup
2026-02-08 09:22:38 +02:00
Georgi Gerganov dfde5993ea
common : add common_speculative_is_compat() (#19270)
* llama : add llama_memory_can_rm_suffix()

* Revert "llama : add llama_memory_can_rm_suffix()"

This reverts commit d30e59b62a.

* spec : check if the target context is compatible for spec decoding
2026-02-06 16:47:22 +02:00
Daniel Bevenius 25f40ca65f
completion : simplify batch (embd) processing (#19286)
* completion : simplify batch (embd) processing

This commit simplifies the processing of embd by removing the for loop
that currently exists which uses params.n_batch as its increment. This
commit also removes the clamping of n_eval as the size of embd is always
at most the size of params.n_batch.

The motivation is to clarify the code as it is currently a little
confusing when looking at this for loop in isolation and thinking that
it can process multiple batches.

* add an assert to verify n_eval is not greater than n_batch
2026-02-04 05:43:28 +01:00
Xuan-Son Nguyen 07a7412a3b
mtmd: add min/max pixels gguf metadata (#19273) 2026-02-02 20:59:06 +01:00
Matthieu Coudron a3fa035822
server: print actual model name in 'model not found" error (#19117)
Experimenting with AI, my environment gets messy fast and it's not
always easy to know what model my software is trying to load. This helps
with troubleshooting.

before:

Error: {
  code = 400,
  message = "model not found",
  type = "invalid_request_error"
}

After:

Error: {
  code = 400,
  message = "model 'toto' not found",
  type = "invalid_request_error"
}
2026-02-02 16:55:27 +01:00
Christian Kastner 7a4ca3cbd9
docs : Minor cleanups (#19252)
* Update old URLs to github.com/ggml-org/

* Bump copyrights
2026-02-02 08:38:55 +02:00
EugeoSynthesisThirtyTwo 3dd95914d0
quantize: add option --tensor-type-file to llama-quantize (#18572)
* add option --tensor-type-file to llama-quantize, but it raises an error.

* add error message when file not found

* quantize: update help menu, fix CI

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
2026-01-31 11:39:21 +08:00
tc-mb ec6c7421e4
mtmd: support MiniCPM-o 4.5(vision only) (#19211)
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
2026-01-30 23:19:30 +01:00
Georgi Gerganov bbada8bfb9
server : wrap around the "id_slot" parameter (#19207)
* server : wrap around the "id_slot" parameter

* cont : minor
2026-01-30 19:46:10 +02:00