Aleksander Grygier
dc7a3f33ba
feat: Integrate server management dialog into chat settings
2026-02-13 13:03:15 +01:00
Aleksander Grygier
0b13c95519
feat: Implement dedicated server management UI components
2026-02-13 13:03:15 +01:00
Aleksander Grygier
8df7e4a54f
refactor: Centralize health check logic in store
2026-02-13 13:03:15 +01:00
Aleksander Grygier
9a8cae462e
feat: Enhance server config with headers and schema normalization
2026-02-13 13:03:15 +01:00
Aleksander Grygier
bc2d879dea
feat: Add McpLogo Svelte component
2026-02-13 13:03:15 +01:00
Aleksander Grygier
42d52605d9
refactor: Consolidate UI CSS classes into shared module
2026-02-13 13:03:15 +01:00
Aleksander Grygier
6c95020b06
chore: update webui build output
2026-02-13 12:57:23 +01:00
Aleksander Grygier
62dbc9f654
feat: Raw LLM output switch per message
2026-02-13 12:57:23 +01:00
Aleksander Grygier
284425097b
refactor: Tool call handling
2026-02-13 12:57:03 +01:00
Aleksander Grygier
5beeb88a37
docs: Update high-level architecture diagrams for MCP integration
2026-02-13 12:55:42 +01:00
Aleksander Grygier
acdd30e3af
feat: Add AgenticContent component for enhanced tool call rendering
2026-02-13 12:55:42 +01:00
Aleksander Grygier
49a8c8b148
refactor: Update ChatStore to leverage mcpStore for agentic flow
2026-02-13 12:55:42 +01:00
Aleksander Grygier
5b582beb75
feat: Implement agentic orchestration within ChatService
2026-02-13 12:55:03 +01:00
Aleksander Grygier
391479edb2
feat: Introduce reactive mcpStore for client lifecycle management
2026-02-13 12:55:03 +01:00
Aleksander Grygier
7e184c174d
feat: Refactor MCP client to use official SDK
2026-02-13 12:55:03 +01:00
Aleksander Grygier
1a041a5b9b
feat: Add @modelcontextprotocol/sdk and zod dependencies
2026-02-13 12:55:03 +01:00
Aleksander Grygier
2325d2a50d
refactor: Update Agentic and MCP config parsing to use new utils and constants
2026-02-13 12:55:03 +01:00
Aleksander Grygier
0c24db3178
feat: Centralize MCP and Agentic type definitions and constants
2026-02-13 12:55:02 +01:00
Aleksander Grygier
26a19183b7
feat: Introduce common utility functions
2026-02-13 12:55:02 +01:00
Pascal
14f6728ef1
webui: use normalizedMessages after upstream refactor
2026-02-13 12:55:02 +01:00
Pascal
cb99ed9f71
webui: MCP client with low coupling to current codebase
2026-02-13 12:55:02 +01:00
Sigbjørn Skjæret
b2ecc0cdb4
support --verbose-prompt ( #19576 )
2026-02-13 12:49:10 +01:00
Aleksander Grygier
5174d7206f
webui: UI and routing fixes ( #19586 )
...
* chore: update webui build output
* chore: update webui build output
* fix: Scroll issues in DropdownMenuSearchable
* webui: fix redirect to root ignoring base path
* fix: Word wrapping
* fix: remove obsolete modality UI tests causing CI failures
- Remove VisionModality/AudioModality test stories
- Remove mockServerProps usage and imports
- Simplify Default test (remove dropdown interaction checks)
- Simplify FileAttachments test (remove mocks)
* feat: Improve formatting performance time
---------
Co-authored-by: Pascal <admin@serveurperso.com>
2026-02-13 12:31:00 +01:00
Aleksander Grygier
4c61875bf8
webui: Add switcher to Chat Message UI to show raw LLM output ( #19571 )
2026-02-12 19:55:51 +01:00
Aleksander Grygier
4d688f9ebb
(webui) FEATURE: Enable adding or injecting System Message into chat ( #19556 )
...
* feat: Enable adding System Prompt per-chat
* fix: Save draft message in Chat Form when adding System Prompt from new chat view
* fix: Proper system message deletion logic
* chore: Formatting
* chore: update webui build output
2026-02-12 13:56:08 +01:00
Aleksander Grygier
f486ce9f30
(webui) REFACTOR: UI primitives and polish ( #19551 )
...
* webui: UI primitives and polish (non-MCP)
* chore: update webui build output
2026-02-12 12:21:00 +01:00
Aleksander Grygier
38adc7d469
WebUI Architecture Cleanup ( #19541 )
...
* webui: architecture foundation (non-MCP core refactors)
* chore: update webui build output
2026-02-12 11:22:27 +01:00
RichardScottOZ
fa16e517a3
server : fix typo in README.md for features list ( #19510 )
...
extra l for full
2026-02-12 08:56:25 +01:00
AesSedai
e463bbdf65
model: Add Kimi-K2.5 support ( #19170 )
...
* Move dequant_model to after the text_config merge
Add new kimi-k2.5 keys to mtmd convert
Update V_MMPROJ tensor mapping for new mm_projector.proj keys
Update V_M_IMP_NORM for new mm_projector.pre_norm key
* Fix a couple of oversights
* Add image support for Kimi-K2.5
* Revert changes to KimiVLForConditionalGeneration
* Fix an assert crash
* Fix permute swapping w / h on accident
* Kimi-K2.5: Use merged QKV for vision
* Kimi-K2.5: pre-convert vision QK to use build_rope_2d
* Kimi-K2.5: support non-interleaved rope for vision
* Kimi-K2.5: fix min / max pixel
* Kimi-K2.5: remove v/o permutes, unnecessary
* Kimi-K2.5: update permute name to match
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Kimi-K2.5: replace build_rope_2d ggml_cont with ggml_view_3d pointers
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-11 16:47:30 +01:00
Georgi Gerganov
6d95707827
model : fix wavtokenizer embedding notions ( #19479 )
2026-02-11 07:52:20 +02:00
JJJYmmm
fc0fe40049
models : support qwen3.5 series ( #19468 )
...
* support qwen3.5 series
* remove deepstack for now, and some code clean
* code clean
* add FULL_ATTENTION_INTERVAL metadata
* code clean
* reorder v heads for linear attention to avoid expensive interleaved repeat
2026-02-10 18:00:26 +02:00
Daniel Bevenius
66d403c480
tts : fix typos in README.md [no ci] ( #19463 )
2026-02-10 07:30:41 +01:00
Tarek Dakhran
262364e31d
mtmd: Implement tiling for LFM2-VL ( #19454 )
2026-02-09 17:30:32 +01:00
손희준
820ebfa6f4
Server: log when converting requests to chat completions format ( #19457 )
...
* Log converting requests
* Print as debug instead of info [no ci]
---------
Co-authored-by: openingnow <>
2026-02-09 16:22:57 +01:00
Sascha Rogmann
292f6908cd
spec : remove check rate ( #19377 )
...
* spec: remove parameter spec-ngram-check-rate
* spec : renamed statistics vars
* spec : add n_call_begin, n_call_accept
* spec : don't enable key-map-stats
2026-02-09 15:30:50 +02:00
Adrien Gallouët
5fa1c190d9
rpc : update from common.cpp ( #19400 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-08 09:06:45 +01:00
Georgi Gerganov
eb449cdfa4
server : improve context checkpoint logic ( #19408 )
2026-02-08 09:40:04 +02:00
ddh0
5999b50eb0
llama-quantize : cleanup `--help` output ( #19317 )
...
* cleanup `llama-quantize --help` output
some much needed TLC
* remove future argument
oops, spoiler
* cleanup of cleanup
2026-02-08 09:22:38 +02:00
Georgi Gerganov
dfde5993ea
common : add common_speculative_is_compat() ( #19270 )
...
* llama : add llama_memory_can_rm_suffix()
* Revert "llama : add llama_memory_can_rm_suffix()"
This reverts commit d30e59b62a .
* spec : check if the target context is compatible for spec decoding
2026-02-06 16:47:22 +02:00
Daniel Bevenius
25f40ca65f
completion : simplify batch (embd) processing ( #19286 )
...
* completion : simplify batch (embd) processing
This commit simplifies the processing of embd by removing the for loop
that currently exists which uses params.n_batch as its increment. This
commit also removes the clamping of n_eval as the size of embd is always
at most the size of params.n_batch.
The motivation is to clarify the code as it is currently a little
confusing when looking at this for loop in isolation and thinking that
it can process multiple batches.
* add an assert to verify n_eval is not greater than n_batch
2026-02-04 05:43:28 +01:00
Xuan-Son Nguyen
07a7412a3b
mtmd: add min/max pixels gguf metadata ( #19273 )
2026-02-02 20:59:06 +01:00
Matthieu Coudron
a3fa035822
server: print actual model name in 'model not found" error ( #19117 )
...
Experimenting with AI, my environment gets messy fast and it's not
always easy to know what model my software is trying to load. This helps
with troubleshooting.
before:
Error: {
code = 400,
message = "model not found",
type = "invalid_request_error"
}
After:
Error: {
code = 400,
message = "model 'toto' not found",
type = "invalid_request_error"
}
2026-02-02 16:55:27 +01:00
Christian Kastner
7a4ca3cbd9
docs : Minor cleanups ( #19252 )
...
* Update old URLs to github.com/ggml-org/
* Bump copyrights
2026-02-02 08:38:55 +02:00
EugeoSynthesisThirtyTwo
3dd95914d0
quantize: add option --tensor-type-file to llama-quantize ( #18572 )
...
* add option --tensor-type-file to llama-quantize, but it raises an error.
* add error message when file not found
* quantize: update help menu, fix CI
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
2026-01-31 11:39:21 +08:00
tc-mb
ec6c7421e4
mtmd: support MiniCPM-o 4.5(vision only) ( #19211 )
...
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
2026-01-30 23:19:30 +01:00
Georgi Gerganov
bbada8bfb9
server : wrap around the "id_slot" parameter ( #19207 )
...
* server : wrap around the "id_slot" parameter
* cont : minor
2026-01-30 19:46:10 +02:00
Georgi Gerganov
dabaa2e77a
spec : add ngram-mod ( #19164 )
...
* spec : add ngram-mod
* cont : simplify + keep track of occupancy
* cont : cleanup
* cont : move initialization to common/speculative
* cont : cleanup
* cont : cleanup
* cont : fix
2026-01-30 18:21:48 +02:00
Andrew Marshall
84b0a98319
webui: Update Svelte to fix effect_update_depth_exceeded errors ( #19144 )
...
The upstream fix is first available in 5.38.2, so constrain to at least
that version.
Rebuild pre-compiled webui index.html.gz based on these changes.
See also:
https://github.com/ggml-org/llama.cpp/issues/16347
https://github.com/huntabyte/bits-ui/issues/1687
https://github.com/sveltejs/svelte/issues/16548
2026-01-29 15:56:39 +01:00
Sascha Rogmann
72d3b1898a
spec : add self‑speculative decoding (no draft model required) + refactor ( #18471 )
...
* server: introduce self-speculative decoding
* server: moved self-call into speculative.cpp
* can_speculate() includes self-speculation
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server: can_speculate() tests self-spec
* server: replace can_speculate() with slot.can_speculate()
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* common: use %zu format specifier for size_t in logging
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* server: can_speculate() requires a task instance
* common: ngram map, config self-speculative decoding
* common: add enum common_speculative_type
* common: add vector of speculative states
* common: add option --spec-draftless
* server: cleanup (remove slot.batch_spec, rename)
* common: moved self-spec impl to ngram-map
* common: cleanup (use common_speculative_state_draft)
* spec : refactor
* cont : naming
* spec: remove --spec-config
* doc: (draftless) speculative decoding
* common: print performance in spec decoding
* minor : cleanup
* common : better names
* minor : cleanup + fix build
* minor: comments
* CODEOWNERS: add common/ngram-map.* (#18471 )
* common : rename speculative.draftless_type -> speculative.type
* ngram-map : fix uninitialized values
* ngram-map : take into account the input can become shorter
* ngram-map : revert len check for now
* arg : change `--spec-draftless` -> `--spec-type`
* spec : add common_speculative_state::accept()
* spec : refactor + add common_speculative_begin()
* spec : fix begin() call with mtmd
* spec : additional refactor + remove common_speculative_params
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-28 19:42:42 +02:00
Georgi Gerganov
b931f81b5a
server : adjust spec tests to generate up to 16 tokens ( #19093 )
2026-01-28 09:11:40 +02:00