Pascal
20e5e70c61
chore: update webui build output
2026-02-13 13:21:35 +01:00
Pascal
a2cce59d69
fix: acurate tool_response display
2026-02-13 13:21:35 +01:00
Pascal
fdd67f45e6
fix: unify MCP server label logic with simplified fallback
2026-02-13 13:21:35 +01:00
Pascal
bdd9bcfb75
chore: update webui build output
2026-02-13 13:21:35 +01:00
Pascal
a515179730
refactor: remove multimodal validation from model selector
...
Remove all frontend validation logic that prevented users from selecting
models based on multimodal capabilities. This refactoring removes
restrictive UI code while maintaining full functionality
- Vision models can describe images as text
- That text remains useful for non-vision models
- Chaining vision -> non-vision is a valid workflow
- Users know their use case better than the UI
- Users can return to vision models when needed
2026-02-13 13:21:35 +01:00
Pascal
c7e76c65d1
chore: update webui build output
2026-02-13 13:21:35 +01:00
Pascal
37c084873c
fix: ignore assistant attachments (MCP) for modality detection
2026-02-13 13:21:35 +01:00
Pascal
d09cdfaf0a
chore: update webui build output
2026-02-13 13:21:35 +01:00
Pascal
6d41f74031
refactor: eliminate MCP circular dependency
...
- Change architecture from mcpStore <-> mcpClient to mcpClient -> mcpStore
- Remove bidirectional callback pattern (set*Callback, notify* methods)
- Add updateState/updateHealthCheck public methods in mcpStore
- Replace callback calls with direct mcpStore method calls
- Remove unused imports (browser, HealthCheckState) and constructor
- Fixes CI: ReferenceError Cannot access mcpClient before initialization
2026-02-13 13:21:35 +01:00
Pascal
07ae189175
chore: update webui build output
2026-02-13 13:21:34 +01:00
Pascal
23741b3c6a
fix: strip reasoning content and UI proprietary tags from prompts
...
TODO: add toggle and ensure backend API compliance for reasoning format
2026-02-13 13:21:34 +01:00
Pascal
b5b527fa52
chore: update webui build output
2026-02-13 13:21:34 +01:00
Pascal
fb1ec29898
refactor: remove reasoning after first turn filter
2026-02-13 13:21:34 +01:00
Pascal
fc5d9f587f
refactor: inline reasoning with tags, remove fixed thinking field
2026-02-13 13:21:34 +01:00
Pascal
6b3bc23fc2
chore: update webui build output
2026-02-13 13:21:34 +01:00
Pascal
c73baed7e3
feat: resolve MCP attachment images via rehype plugin
...
LLM can reference tool-generated images using markdown links like,
plugin resolves attachment names to base64 from message.extra when present,
regular HTTP/data URLs pass through unchanged (no regression)
- rehypeResolveAttachmentImages plugin in markdown pipeline
- Pass message prop to MarkdownContent and AgenticContent
- Force processor reactivity on message.extra changes
- Filter assistant images from API context (display-only)
2026-02-13 13:21:34 +01:00
Pascal
09381a59fd
feat: persist base64 attachments from tool results
2026-02-13 13:21:34 +01:00
Pascal
f16457551e
webui: fix custom headers persistence in UI (derived)
2026-02-13 13:21:34 +01:00
Pascal
f42e5f114e
webui: fix custom headers persistence in UI
2026-02-13 13:21:34 +01:00
Aleksander Grygier
162bd976ed
fix: Word wrapping
2026-02-13 13:21:34 +01:00
Aleksander Grygier
c2dd1d2fed
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
008463149b
feat: UI improvements
2026-02-13 13:21:34 +01:00
Aleksander Grygier
1dba2ec4a9
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
805c171825
feat: UI improvement
2026-02-13 13:21:34 +01:00
Aleksander Grygier
d6455a7530
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
bb4bd7fe09
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
05dfb5e70c
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
cad9ca1208
feat: MCP Server Details
2026-02-13 13:21:34 +01:00
Aleksander Grygier
0e980bf881
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
825d2ea9a9
feat: MCP connection details WIP
2026-02-13 13:21:34 +01:00
Aleksander Grygier
2b37f70c37
refactor: MCP types and health check
2026-02-13 13:21:34 +01:00
Aleksander Grygier
36a37d1794
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
38ba6d8372
refactor: KeyValuePairs component
2026-02-13 13:21:34 +01:00
Aleksander Grygier
c5465d4893
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
57089370e4
refactor: DRY
2026-02-13 13:21:34 +01:00
Aleksander Grygier
f80d5f615e
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
e1da51335c
refactor: Architecture improvements
2026-02-13 13:21:34 +01:00
Aleksander Grygier
3bc8d93546
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
48b2b1b2f0
refactor: MCP state management + stores/clients relationship
2026-02-13 13:21:34 +01:00
Aleksander Grygier
2cd682178b
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
da8baaa9b8
fix: Distinguish streaming vs incomplete tool calls in UI
2026-02-13 13:21:34 +01:00
Aleksander Grygier
3179858e5f
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
9471729162
fix: Restore live reactive UI progress for tool calls
2026-02-13 13:21:34 +01:00
Aleksander Grygier
64923b20be
chore: update webui build output
2026-02-13 13:21:34 +01:00
Pascal
179477b4ed
fix: reset tool call state between turns
2026-02-13 13:21:34 +01:00
Pascal
38244a1bfa
webui: enable streaming of tool call arguments
2026-02-13 13:21:34 +01:00
Aleksander Grygier
2faf237d01
chore: update webui build output
2026-02-13 13:21:34 +01:00
Aleksander Grygier
5ffb6aba3a
refactor: Cleanup
2026-02-13 13:21:34 +01:00
Pascal
96e51e2a41
webui: prevent mobile dropdown immediate close on synthetic click
2026-02-13 13:20:42 +01:00
Pascal
8916698294
webui: fix redirect to root ignoring base path
2026-02-13 13:20:42 +01:00
Aleksander Grygier
2a33fc2059
refactor: Cleanup
2026-02-13 13:20:41 +01:00
Aleksander Grygier
04913f20d9
chore: update webui build output
2026-02-13 13:20:41 +01:00
Aleksander Grygier
939e7aa16b
refactor: Types
2026-02-13 13:20:41 +01:00
Aleksander Grygier
bef865d871
refactor: Componentize McpServerCard
2026-02-13 13:20:41 +01:00
Aleksander Grygier
7dbb05a160
refactor: Cleanup
2026-02-13 13:20:41 +01:00
Aleksander Grygier
7e194f653a
fix: Remove redundant CSS class
2026-02-13 13:20:41 +01:00
Aleksander Grygier
02c87fa3c9
feat: Add TruncatedText component
2026-02-13 13:20:41 +01:00
Aleksander Grygier
27b80ae3e8
fix: Collapsible box trigger
2026-02-13 13:20:26 +01:00
Aleksander Grygier
408e098324
refactor: Cleanup
2026-02-13 13:20:26 +01:00
Aleksander Grygier
0b36d04c38
refactor: Cleanup
2026-02-13 13:20:07 +01:00
Aleksander Grygier
df464c1f5a
refactor: Collapsible Content Block & small fixes
2026-02-13 13:18:20 +01:00
Aleksander Grygier
26044454ef
chore: update webui build output
2026-02-13 13:18:20 +01:00
Aleksander Grygier
f0ac6fa039
refactor: Cleanup
2026-02-13 13:18:20 +01:00
Aleksander Grygier
7c9ba36216
chore: update webui build output
2026-02-13 13:18:20 +01:00
Aleksander Grygier
7ab269cd77
feat: UI improvements
2026-02-13 13:18:20 +01:00
Aleksander Grygier
e0122465ed
feat: Always show Mcp Selector
2026-02-13 13:18:20 +01:00
Pascal
36c9ad9303
fix: remove double scrollbar in model selector by using Bits UI content available height
2026-02-13 13:18:20 +01:00
Aleksander Grygier
bc60beb1a7
feat: Enable adding System Prompt per-chat
2026-02-13 13:18:20 +01:00
Aleksander Grygier
276a3e9416
fix: UI
2026-02-13 13:17:51 +01:00
Aleksander Grygier
c74065de75
chore: update webui build output
2026-02-13 13:17:51 +01:00
Aleksander Grygier
e6ad864984
feat: UI improvements
2026-02-13 13:17:51 +01:00
Pascal
cff237cb3e
webui: raw tool result display, strip only leading/trailing newlines to preserve indentation
2026-02-13 13:17:33 +01:00
Pascal
afb79b2970
webui: split raw output into backend parsing and frontend display options
2026-02-13 13:17:33 +01:00
Pascal
18efdabb12
webui: remove legacy wrapper and restore WebSocket transport
2026-02-13 13:17:33 +01:00
Pascal
a13782a4d1
webui: remove unused imports
2026-02-13 13:17:33 +01:00
Aleksander Grygier
d548bf27dd
chore: update webui build output
2026-02-13 13:17:33 +01:00
Aleksander Grygier
bdd5958f6d
feat: Improve agentic tool call streaming display with 'in progress' state
2026-02-13 13:17:32 +01:00
Aleksander Grygier
a9c2ea7a8e
feat: Enhance MCP server dropdown with search, popularity sorting, and per-chat overrides
2026-02-13 13:17:32 +01:00
Aleksander Grygier
dfce09b34b
feat: Add per-chat MCP server overrides
2026-02-13 13:17:32 +01:00
Aleksander Grygier
54374edecd
chore: update webui build output
2026-02-13 13:17:32 +01:00
Aleksander Grygier
b763a4cc69
feat: Add image load error fallback in MarkdownContent
2026-02-13 13:17:32 +01:00
Aleksander Grygier
af9a76b6dc
feat: Implement lazy MCP client shutdown
2026-02-13 13:17:32 +01:00
Aleksander Grygier
c7870a3903
feat: Enhance tool call streaming UI and output format
2026-02-13 13:17:32 +01:00
Aleksander Grygier
fb5e464fe7
feat: Display and manage servers in ChatForm actions
2026-02-13 13:17:32 +01:00
Aleksander Grygier
dc7a3f33ba
feat: Integrate server management dialog into chat settings
2026-02-13 13:03:15 +01:00
Aleksander Grygier
0b13c95519
feat: Implement dedicated server management UI components
2026-02-13 13:03:15 +01:00
Aleksander Grygier
8df7e4a54f
refactor: Centralize health check logic in store
2026-02-13 13:03:15 +01:00
Aleksander Grygier
9a8cae462e
feat: Enhance server config with headers and schema normalization
2026-02-13 13:03:15 +01:00
Aleksander Grygier
bc2d879dea
feat: Add McpLogo Svelte component
2026-02-13 13:03:15 +01:00
Aleksander Grygier
42d52605d9
refactor: Consolidate UI CSS classes into shared module
2026-02-13 13:03:15 +01:00
Aleksander Grygier
6c95020b06
chore: update webui build output
2026-02-13 12:57:23 +01:00
Aleksander Grygier
62dbc9f654
feat: Raw LLM output switch per message
2026-02-13 12:57:23 +01:00
Aleksander Grygier
284425097b
refactor: Tool call handling
2026-02-13 12:57:03 +01:00
Aleksander Grygier
5beeb88a37
docs: Update high-level architecture diagrams for MCP integration
2026-02-13 12:55:42 +01:00
Aleksander Grygier
acdd30e3af
feat: Add AgenticContent component for enhanced tool call rendering
2026-02-13 12:55:42 +01:00
Aleksander Grygier
49a8c8b148
refactor: Update ChatStore to leverage mcpStore for agentic flow
2026-02-13 12:55:42 +01:00
Aleksander Grygier
5b582beb75
feat: Implement agentic orchestration within ChatService
2026-02-13 12:55:03 +01:00
Aleksander Grygier
391479edb2
feat: Introduce reactive mcpStore for client lifecycle management
2026-02-13 12:55:03 +01:00
Aleksander Grygier
7e184c174d
feat: Refactor MCP client to use official SDK
2026-02-13 12:55:03 +01:00
Aleksander Grygier
1a041a5b9b
feat: Add @modelcontextprotocol/sdk and zod dependencies
2026-02-13 12:55:03 +01:00
Aleksander Grygier
2325d2a50d
refactor: Update Agentic and MCP config parsing to use new utils and constants
2026-02-13 12:55:03 +01:00
Aleksander Grygier
0c24db3178
feat: Centralize MCP and Agentic type definitions and constants
2026-02-13 12:55:02 +01:00
Aleksander Grygier
26a19183b7
feat: Introduce common utility functions
2026-02-13 12:55:02 +01:00
Pascal
14f6728ef1
webui: use normalizedMessages after upstream refactor
2026-02-13 12:55:02 +01:00
Pascal
cb99ed9f71
webui: MCP client with low coupling to current codebase
2026-02-13 12:55:02 +01:00
Sigbjørn Skjæret
b2ecc0cdb4
support --verbose-prompt ( #19576 )
2026-02-13 12:49:10 +01:00
Aleksander Grygier
5174d7206f
webui: UI and routing fixes ( #19586 )
...
* chore: update webui build output
* chore: update webui build output
* fix: Scroll issues in DropdownMenuSearchable
* webui: fix redirect to root ignoring base path
* fix: Word wrapping
* fix: remove obsolete modality UI tests causing CI failures
- Remove VisionModality/AudioModality test stories
- Remove mockServerProps usage and imports
- Simplify Default test (remove dropdown interaction checks)
- Simplify FileAttachments test (remove mocks)
* feat: Improve formatting performance time
---------
Co-authored-by: Pascal <admin@serveurperso.com>
2026-02-13 12:31:00 +01:00
Aleksander Grygier
4c61875bf8
webui: Add switcher to Chat Message UI to show raw LLM output ( #19571 )
2026-02-12 19:55:51 +01:00
Aleksander Grygier
4d688f9ebb
(webui) FEATURE: Enable adding or injecting System Message into chat ( #19556 )
...
* feat: Enable adding System Prompt per-chat
* fix: Save draft message in Chat Form when adding System Prompt from new chat view
* fix: Proper system message deletion logic
* chore: Formatting
* chore: update webui build output
2026-02-12 13:56:08 +01:00
Aleksander Grygier
f486ce9f30
(webui) REFACTOR: UI primitives and polish ( #19551 )
...
* webui: UI primitives and polish (non-MCP)
* chore: update webui build output
2026-02-12 12:21:00 +01:00
Aleksander Grygier
38adc7d469
WebUI Architecture Cleanup ( #19541 )
...
* webui: architecture foundation (non-MCP core refactors)
* chore: update webui build output
2026-02-12 11:22:27 +01:00
RichardScottOZ
fa16e517a3
server : fix typo in README.md for features list ( #19510 )
...
extra l for full
2026-02-12 08:56:25 +01:00
AesSedai
e463bbdf65
model: Add Kimi-K2.5 support ( #19170 )
...
* Move dequant_model to after the text_config merge
Add new kimi-k2.5 keys to mtmd convert
Update V_MMPROJ tensor mapping for new mm_projector.proj keys
Update V_M_IMP_NORM for new mm_projector.pre_norm key
* Fix a couple of oversights
* Add image support for Kimi-K2.5
* Revert changes to KimiVLForConditionalGeneration
* Fix an assert crash
* Fix permute swapping w / h on accident
* Kimi-K2.5: Use merged QKV for vision
* Kimi-K2.5: pre-convert vision QK to use build_rope_2d
* Kimi-K2.5: support non-interleaved rope for vision
* Kimi-K2.5: fix min / max pixel
* Kimi-K2.5: remove v/o permutes, unnecessary
* Kimi-K2.5: update permute name to match
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Kimi-K2.5: replace build_rope_2d ggml_cont with ggml_view_3d pointers
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-11 16:47:30 +01:00
Georgi Gerganov
6d95707827
model : fix wavtokenizer embedding notions ( #19479 )
2026-02-11 07:52:20 +02:00
JJJYmmm
fc0fe40049
models : support qwen3.5 series ( #19468 )
...
* support qwen3.5 series
* remove deepstack for now, and some code clean
* code clean
* add FULL_ATTENTION_INTERVAL metadata
* code clean
* reorder v heads for linear attention to avoid expensive interleaved repeat
2026-02-10 18:00:26 +02:00
Daniel Bevenius
66d403c480
tts : fix typos in README.md [no ci] ( #19463 )
2026-02-10 07:30:41 +01:00
Tarek Dakhran
262364e31d
mtmd: Implement tiling for LFM2-VL ( #19454 )
2026-02-09 17:30:32 +01:00
손희준
820ebfa6f4
Server: log when converting requests to chat completions format ( #19457 )
...
* Log converting requests
* Print as debug instead of info [no ci]
---------
Co-authored-by: openingnow <>
2026-02-09 16:22:57 +01:00
Sascha Rogmann
292f6908cd
spec : remove check rate ( #19377 )
...
* spec: remove parameter spec-ngram-check-rate
* spec : renamed statistics vars
* spec : add n_call_begin, n_call_accept
* spec : don't enable key-map-stats
2026-02-09 15:30:50 +02:00
Adrien Gallouët
5fa1c190d9
rpc : update from common.cpp ( #19400 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-08 09:06:45 +01:00
Georgi Gerganov
eb449cdfa4
server : improve context checkpoint logic ( #19408 )
2026-02-08 09:40:04 +02:00
ddh0
5999b50eb0
llama-quantize : cleanup `--help` output ( #19317 )
...
* cleanup `llama-quantize --help` output
some much needed TLC
* remove future argument
oops, spoiler
* cleanup of cleanup
2026-02-08 09:22:38 +02:00
Georgi Gerganov
dfde5993ea
common : add common_speculative_is_compat() ( #19270 )
...
* llama : add llama_memory_can_rm_suffix()
* Revert "llama : add llama_memory_can_rm_suffix()"
This reverts commit d30e59b62a .
* spec : check if the target context is compatible for spec decoding
2026-02-06 16:47:22 +02:00
Daniel Bevenius
25f40ca65f
completion : simplify batch (embd) processing ( #19286 )
...
* completion : simplify batch (embd) processing
This commit simplifies the processing of embd by removing the for loop
that currently exists which uses params.n_batch as its increment. This
commit also removes the clamping of n_eval as the size of embd is always
at most the size of params.n_batch.
The motivation is to clarify the code as it is currently a little
confusing when looking at this for loop in isolation and thinking that
it can process multiple batches.
* add an assert to verify n_eval is not greater than n_batch
2026-02-04 05:43:28 +01:00
Xuan-Son Nguyen
07a7412a3b
mtmd: add min/max pixels gguf metadata ( #19273 )
2026-02-02 20:59:06 +01:00
Matthieu Coudron
a3fa035822
server: print actual model name in 'model not found" error ( #19117 )
...
Experimenting with AI, my environment gets messy fast and it's not
always easy to know what model my software is trying to load. This helps
with troubleshooting.
before:
Error: {
code = 400,
message = "model not found",
type = "invalid_request_error"
}
After:
Error: {
code = 400,
message = "model 'toto' not found",
type = "invalid_request_error"
}
2026-02-02 16:55:27 +01:00
Christian Kastner
7a4ca3cbd9
docs : Minor cleanups ( #19252 )
...
* Update old URLs to github.com/ggml-org/
* Bump copyrights
2026-02-02 08:38:55 +02:00
EugeoSynthesisThirtyTwo
3dd95914d0
quantize: add option --tensor-type-file to llama-quantize ( #18572 )
...
* add option --tensor-type-file to llama-quantize, but it raises an error.
* add error message when file not found
* quantize: update help menu, fix CI
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
2026-01-31 11:39:21 +08:00
tc-mb
ec6c7421e4
mtmd: support MiniCPM-o 4.5(vision only) ( #19211 )
...
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
2026-01-30 23:19:30 +01:00
Georgi Gerganov
bbada8bfb9
server : wrap around the "id_slot" parameter ( #19207 )
...
* server : wrap around the "id_slot" parameter
* cont : minor
2026-01-30 19:46:10 +02:00
Georgi Gerganov
dabaa2e77a
spec : add ngram-mod ( #19164 )
...
* spec : add ngram-mod
* cont : simplify + keep track of occupancy
* cont : cleanup
* cont : move initialization to common/speculative
* cont : cleanup
* cont : cleanup
* cont : fix
2026-01-30 18:21:48 +02:00
Andrew Marshall
84b0a98319
webui: Update Svelte to fix effect_update_depth_exceeded errors ( #19144 )
...
The upstream fix is first available in 5.38.2, so constrain to at least
that version.
Rebuild pre-compiled webui index.html.gz based on these changes.
See also:
https://github.com/ggml-org/llama.cpp/issues/16347
https://github.com/huntabyte/bits-ui/issues/1687
https://github.com/sveltejs/svelte/issues/16548
2026-01-29 15:56:39 +01:00
Sascha Rogmann
72d3b1898a
spec : add self‑speculative decoding (no draft model required) + refactor ( #18471 )
...
* server: introduce self-speculative decoding
* server: moved self-call into speculative.cpp
* can_speculate() includes self-speculation
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server: can_speculate() tests self-spec
* server: replace can_speculate() with slot.can_speculate()
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* common: use %zu format specifier for size_t in logging
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* server: can_speculate() requires a task instance
* common: ngram map, config self-speculative decoding
* common: add enum common_speculative_type
* common: add vector of speculative states
* common: add option --spec-draftless
* server: cleanup (remove slot.batch_spec, rename)
* common: moved self-spec impl to ngram-map
* common: cleanup (use common_speculative_state_draft)
* spec : refactor
* cont : naming
* spec: remove --spec-config
* doc: (draftless) speculative decoding
* common: print performance in spec decoding
* minor : cleanup
* common : better names
* minor : cleanup + fix build
* minor: comments
* CODEOWNERS: add common/ngram-map.* (#18471 )
* common : rename speculative.draftless_type -> speculative.type
* ngram-map : fix uninitialized values
* ngram-map : take into account the input can become shorter
* ngram-map : revert len check for now
* arg : change `--spec-draftless` -> `--spec-type`
* spec : add common_speculative_state::accept()
* spec : refactor + add common_speculative_begin()
* spec : fix begin() call with mtmd
* spec : additional refactor + remove common_speculative_params
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-28 19:42:42 +02:00
Georgi Gerganov
b931f81b5a
server : adjust spec tests to generate up to 16 tokens ( #19093 )
2026-01-28 09:11:40 +02:00
Georgi Gerganov
080b161995
completion : fix prompt cache for recurrent models ( #19045 )
2026-01-25 09:12:50 +02:00
Daniel Bevenius
16639ba217
common : use two decimal places for float arg help messages ( #19048 )
...
* common : use two decimal places for float arg help messages
This commit updates the help messages for various command-line arguments
in arg.cpp to display floating-point default values with two decimal
places instead of one.
The motivation for this changes is that currently only having one decimal
place means that values generated using --help or llama-gen-docs will not
display the correct values.
For example, currently the value of top-p in tools/server/README.md is
`0.9`, but the default value is actually '0.95'. And running
llama-gen-docs does not update this value as it uses the output from the
help message, which shows only one decimal place, so the values look
like they are unchanged.
* docs : run llama-gen-docs to update docs
2026-01-25 07:31:42 +01:00
Johannes Gäßler
e9fd8dcab4
llama-fit-params: keep explicit --ctx-size 0 ( #19070 )
2026-01-24 22:13:08 +01:00
Aldehir Rojas
a3e812811d
cli : load parser definition ( #19031 )
...
* cli : load parser definition
* cont : only unload if a parser is defined
2026-01-22 20:31:22 -06:00
Xuan-Son Nguyen
51fa458a92
server : support preserving reasoning_content in assistant message ( #18994 )
...
* support reasoning_content input
* report template caps to webui
* add docs
* rm commented code
2026-01-22 21:30:06 +01:00
Xuan-Son Nguyen
4e595b250a
server: do not log certain endpoints (avoid log spam) ( #19028 )
2026-01-22 19:24:37 +01:00
Xuan-Son Nguyen
9eb5bfec1a
mtmd : update docs to use llama_model_n_embd_inp ( #18999 )
2026-01-22 14:36:32 +01:00
손희준
c6926d1d95
server: Reorder methods in `server-task.cpp` ( #19016 )
...
* Move `task_result_state::update_chat_msg` to match with header
* Move `server_task_result_cmpl_partial::to_json_anthropic()` to match with header
---------
Co-authored-by: openingnow <>
2026-01-22 14:36:04 +01:00
Hendrik Erz
3802d3c78f
fix: Use `tabular-nums` for chat message statistics ( #18915 )
...
* fix: Use `tabular-nums` for chat message statistics
* fix: Rebuild WebUI
2026-01-21 18:46:01 +01:00
손희준
fbbf3ad190
server: /v1/responses (partial) ( #18486 )
...
* from previous PR
* Make instruction(system) as first message
* Convert [input_message] (text/image/file)
* Rename convert_responses_to_chatcmpl(body) -> response_body
* Initial tool call support
* Erase instructions field from chatcmpl body
* Feed reasoning texts to chat template
* Use std::vector instead of opaque json array
* Make output_item.added events consistent
* Move `server_task_result_cmpl_partial::update` from header to source
* Match ID of output_item.added and .done events
* Add function_call only if there is no "fc_" prefix
* Add function call output at non-streaming API
* Test if ID is persistent
* Add doc
* Fix style - use trailing comma
* Rewrite state management
* catch up with upstream/master
* Fix style - "type" is the first item of SSE data
* Explicitly check "instructions" from response_body
* Make lambdas static
* Check if reasoning content exists
* Add `oai_resp_id` to task_result_state(also initialized at ctor), server_task_result_cmpl_partial, and server_task_result_cmpl_final
* Reject `input_file` since it is not supported by chatcmpl
* Add "fc_" prefix to non-straming function call id as coderabbit pointed out
---------
Co-authored-by: openingnow <>
2026-01-21 17:47:23 +01:00
Adrien Gallouët
1c7cf94b22
common, server : use the same User-Agent by default ( #18957 )
...
This commit also ensures that if a custom User-Agent is used, it will be
the only one sent.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-20 18:28:43 +01:00
Xuan-Son Nguyen
2c1f199653
cli : fix reasoning responses in CLI ( #18961 )
...
* cli : fix reasoning responses in CLI
* fix build
* fix build (2)
2026-01-20 18:23:25 +01:00
Xuan-Son Nguyen
6df686bee6
server : refactor oai_parser_opt, move it to server_chat_params ( #18937 )
...
* server_chat_params
* move chat format into CLI
* use meta whenever possible
* clean up, no more chatml fallback
2026-01-19 23:28:01 +01:00
Lennart Austenfeld
18361c579c
server: fix memory reservations in populate_token_probs ( #18787 )
2026-01-19 19:13:31 +01:00
Tarek Dakhran
c945aaaef2
mtmd : Fix ASR for LFM2.5-Audio-1.5B ( #18876 )
2026-01-16 11:23:08 +01:00
Xuan-Son Nguyen
c15395f73c
common : implement new jinja template engine ( #18462 )
...
* jinja vm
* lexer
* add vm types
* demo
* clean up
* parser ok
* binary_expression::execute
* shadow naming
* bin ops works!
* fix map object
* add string builtins
* add more builtins
* wip
* use mk_val
* eval with is_user_input
* render gemma tmpl ok
* track input string even after transformations
* support binded functions
* keyword arguments and slicing array
* use shared_ptr for values
* add mk_stmt
* allow print source on exception
* fix negate test
* testing more templates
* mostly works
* add filter_statement
* allow func to access ctx
* add jinja-value.cpp
* impl global_from_json
* a lot of fixes
* more tests
* more fix, more tests
* more fixes
* rm workarounds
* demo: type inferrence
* add placeholder for tojson
* improve function args handling
* rm type inference
* no more std::regex
* trailing spaces
* make testing more flexible
* make output a bit cleaner
* (wip) redirect minja calls
* test: add --output
* fix crash on macro kwargs
* add minimal caps system
* add some workarounds
* rm caps_apply_workarounds
* get rid of preprocessing
* more fixes
* fix test-chat-template
* move test-chat-jinja into test-chat-template
* rm test-chat-jinja from cmake
* test-chat-template: use common
* fix build
* fix build (2)
* rename vm --> interpreter
* improve error reporting
* correct lstrip behavior
* add tojson
* more fixes
* disable tests for COMMON_CHAT_FORMAT_GENERIC
* make sure tojson output correct order
* add object.length
* fully functional selectattr / rejectattr
* improve error reporting
* more builtins added, more fixes
* create jinja rendering tests
* fix testing.h path
* adjust whitespace rules
* more fixes
* temporary disable test for ibm-granite
* r/lstrip behavior matched with hf.js
* minimax, glm4.5 ok
* add append and pop
* kimi-k2 ok
* test-chat passed
* fix lstrip_block
* add more jinja tests
* cast to unsigned char
* allow dict key to be numeric
* nemotron: rm windows newline
* tests ok
* fix test
* rename interpreter --> runtime
* fix build
* add more checks
* bring back generic format support
* fix Apertus
* [json.exception.out_of_range.403] key 'content' not found
* rm generic test
* refactor input marking
* add docs
* fix windows build
* clarify error message
* improved tests
* split/rsplit with maxsplit
* non-inverse maxsplit
forgot to change after simplifying
* implement separators for tojson and fix indent
* i like to move it move it
* rename null -- > none
* token::eof
* some nits + comments
* add exception classes for lexer and parser
* null -> none
* rename global -> env
* rm minja
* update docs
* docs: add input marking caveats
* imlement missing jinja-tests functions
* oops
* support trim filter with args, remove bogus to_json reference
* numerous argument fixes
* updated tests
* implement optional strip chars parameter
* use new chars parameter
* float filter also has default
* always leave at least one decimal in float string
* jinja : static analysis + header cleanup + minor fixes
* add fuzz test
* add string.cpp
* fix chat_template_kwargs
* nits
* fix build
* revert
* unrevert
sorry :)
* add fuzz func_args, refactor to be safer
* fix array.map()
* loosen ensure_vals max count condition, add not impl for map(int)
* hopefully fix windows
* check if empty first
* normalize newlines
---------
Co-authored-by: Alde Rojas <hello@alde.dev>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-16 11:22:06 +01:00