Pascal
78c6380222
refactor: remove reasoning after first turn filter
2026-01-16 15:19:50 +01:00
Pascal
2973c64609
refactor: inline reasoning with tags, remove fixed thinking field
2026-01-16 15:19:42 +01:00
Xuan-Son Nguyen
c15395f73c
common : implement new jinja template engine ( #18462 )
...
* jinja vm
* lexer
* add vm types
* demo
* clean up
* parser ok
* binary_expression::execute
* shadow naming
* bin ops works!
* fix map object
* add string builtins
* add more builtins
* wip
* use mk_val
* eval with is_user_input
* render gemma tmpl ok
* track input string even after transformations
* support binded functions
* keyword arguments and slicing array
* use shared_ptr for values
* add mk_stmt
* allow print source on exception
* fix negate test
* testing more templates
* mostly works
* add filter_statement
* allow func to access ctx
* add jinja-value.cpp
* impl global_from_json
* a lot of fixes
* more tests
* more fix, more tests
* more fixes
* rm workarounds
* demo: type inferrence
* add placeholder for tojson
* improve function args handling
* rm type inference
* no more std::regex
* trailing spaces
* make testing more flexible
* make output a bit cleaner
* (wip) redirect minja calls
* test: add --output
* fix crash on macro kwargs
* add minimal caps system
* add some workarounds
* rm caps_apply_workarounds
* get rid of preprocessing
* more fixes
* fix test-chat-template
* move test-chat-jinja into test-chat-template
* rm test-chat-jinja from cmake
* test-chat-template: use common
* fix build
* fix build (2)
* rename vm --> interpreter
* improve error reporting
* correct lstrip behavior
* add tojson
* more fixes
* disable tests for COMMON_CHAT_FORMAT_GENERIC
* make sure tojson output correct order
* add object.length
* fully functional selectattr / rejectattr
* improve error reporting
* more builtins added, more fixes
* create jinja rendering tests
* fix testing.h path
* adjust whitespace rules
* more fixes
* temporary disable test for ibm-granite
* r/lstrip behavior matched with hf.js
* minimax, glm4.5 ok
* add append and pop
* kimi-k2 ok
* test-chat passed
* fix lstrip_block
* add more jinja tests
* cast to unsigned char
* allow dict key to be numeric
* nemotron: rm windows newline
* tests ok
* fix test
* rename interpreter --> runtime
* fix build
* add more checks
* bring back generic format support
* fix Apertus
* [json.exception.out_of_range.403] key 'content' not found
* rm generic test
* refactor input marking
* add docs
* fix windows build
* clarify error message
* improved tests
* split/rsplit with maxsplit
* non-inverse maxsplit
forgot to change after simplifying
* implement separators for tojson and fix indent
* i like to move it move it
* rename null -- > none
* token::eof
* some nits + comments
* add exception classes for lexer and parser
* null -> none
* rename global -> env
* rm minja
* update docs
* docs: add input marking caveats
* imlement missing jinja-tests functions
* oops
* support trim filter with args, remove bogus to_json reference
* numerous argument fixes
* updated tests
* implement optional strip chars parameter
* use new chars parameter
* float filter also has default
* always leave at least one decimal in float string
* jinja : static analysis + header cleanup + minor fixes
* add fuzz test
* add string.cpp
* fix chat_template_kwargs
* nits
* fix build
* revert
* unrevert
sorry :)
* add fuzz func_args, refactor to be safer
* fix array.map()
* loosen ensure_vals max count condition, add not impl for map(int)
* hopefully fix windows
* check if empty first
* normalize newlines
---------
Co-authored-by: Alde Rojas <hello@alde.dev>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-16 11:22:06 +01:00
Pascal
a1550ab77d
chore: update webui build output
2026-01-16 11:02:17 +01:00
Pascal
db37b712b2
feat: resolve MCP attachment images via rehype plugin
...
LLM can reference tool-generated images using markdown links like,
plugin resolves attachment names to base64 from message.extra when present,
regular HTTP/data URLs pass through unchanged (no regression)
- rehypeResolveAttachmentImages plugin in markdown pipeline
- Pass message prop to MarkdownContent and AgenticContent
- Force processor reactivity on message.extra changes
- Filter assistant images from API context (display-only)
2026-01-16 10:49:28 +01:00
Pascal
a3c2144c1d
feat: persist base64 attachments from tool results
2026-01-16 08:07:20 +01:00
Pascal
a377605f60
webui: fix custom headers persistence in UI (derived)
2026-01-15 20:36:14 +01:00
Pascal
3360f60b94
webui: fix custom headers persistence in UI
2026-01-15 20:13:01 +01:00
ddh0
13f1e4a9ca
llama : add adaptive-p sampler ( #17927 )
...
* initial commit for branch
* simplify constants
* add params to `struct common_params_sampling`, add reference to PR
* explicitly clamp `min_target` and `max_target` to `[0.0, 1.0]`
* add args, rename `queue_size` -> `window_size`
* improved comments
* minor
* remove old unused code from algorithm
* minor
* add power law case to `common_sampler_init`, add sampler name mappings
* clarify behaviour when `window_size = 0`
* add missing enums
* remove `target_range` param, make `target == 1` no-op, cleanup code
* oops, straggler
* add missing parameters in `server-task.cpp`
* copy from author
ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069
* remove old debug log, style nit
* fix compiler warning, add commented-out logging per token
* re-write + change parameters + simplify
* oops forgot args.cpp
* fix leftover `window_size`
* add missing values to `common_params_sampling::print()`
* with logging
* does this fix it?
* no, but does this?
* update default decay
* optimize
* fix bad merge
my git skills are lacking
* silence `missing initializer for member`
* update default decay to 0.9
* fix logging
* format (double)
* add power law to the new `samplers` vector
* log sampler init values
* improve logging messages in llama_sampler_power_law
* remove extraneous logging
* simplify target computation
last commit with debug logging!
* remove debug logging, explicitly clamp params at init
* add `use_power_law` flag + logic, minor cleanup
* update `power-law` -> `adaptive-p`
* fix cold start EMA
- `ctx->weighted_sum` is now initialized and reset to `target / (1.0f -
clamped_decay)`
- `ctx->total_weight` is now initialized and reset to `1.0f / (1.0f -
clamped_decay)`
this fixes a "cold start" problem with the moving average
* update `SHARPNESS` constant to `10.0f`
* minor style fixes
no functional changes
* minor style fixes cont.
* update `llama_sampler_adaptive_p_i` for backend sampling (ref: #17004 )
* separate into `apply` + `accept` functions
* `pending_token_idx`: switch from `llama_token` to `int32`
functionally identical (`llama.h` has `typedef int32_t llama_token;`),
but its more correct now
* don't transform logits <= -1e9f
* fix masking in backend top-p, min-p
* address review comments
* typo in comments `RND` -> `RNG`
* add docs
* add recommended values in completion docs
* address PR feedback
* remove trailing whitespace (for CI `editorconfig`)
* add to adaptive-p to `common_sampler_types_from_chars`
2026-01-15 19:16:29 +02:00
Aleksander Grygier
cffc3b46ae
fix: Word wrapping
2026-01-15 17:59:57 +01:00
Xuan-Son Nguyen
a04c2b06a3
server: improve slots scheduling for n_cmpl ( #18789 )
...
* server : make sure children tasks are scheduled to launch with parent
* fix
* add comment pointing to this PR
* fix
* clean up
* more debug messages
* add pop_deferred_task with specific ID version
* improve the logic
* simple approach
* no double move
* correct return type of launch_slots_with_parent_task
2026-01-15 17:10:28 +01:00
Georgi Gerganov
39173bcacb
context : reserve new scheduler when graph topology changes ( #18547 )
...
* context : reserve new scheduler when graph topology changes
* cont : fix
* cont : fix reserve
* cont : reserve only when changes occur + timing
* context : add comments
* llama : reserve on sampler changes
* common : allow null common_sampler
* server : task declares needs (embd, logits, sampling)
* server : do not init sampler if not needed
* llama : fix need_reserve when unsetting a sampler
* server : consolidate slot reset/clear logic
2026-01-15 16:39:17 +02:00
Aleksander Grygier
5417a439ef
chore: update webui build output
2026-01-15 11:39:10 +01:00
Aleksander Grygier
30a585bb96
feat: UI improvements
2026-01-14 17:32:57 +01:00
Aleksander Grygier
886939c550
chore: update webui build output
2026-01-14 14:39:32 +01:00
Aleksander Grygier
39848ee12f
feat: UI improvement
2026-01-14 14:26:41 +01:00
Aleksander Grygier
c1ac8d7326
chore: update webui build output
2026-01-14 13:22:01 +01:00
Aleksander Grygier
afdae742e3
Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp
2026-01-14 13:20:25 +01:00
Aleksander Grygier
b11b32ea28
chore: update webui build output
2026-01-14 12:47:13 +01:00
Aleksander Grygier
06efeb6eb9
chore: update webui build output
2026-01-14 11:49:26 +01:00
Aleksander Grygier
f89bcb90ca
feat: MCP Server Details
2026-01-14 11:45:47 +01:00
Aleksander Grygier
120f3c978c
chore: update webui build output
2026-01-12 18:27:54 +01:00
Aleksander Grygier
5407b2efab
feat: MCP connection details WIP
2026-01-12 18:26:48 +01:00
Radoslav Gerganov
bcf7546160
server : add arg for disabling prompt caching ( #18776 )
...
* server : add arg for disabling prompt caching
Disabling prompt caching is useful for clients who are restricted to
sending only OpenAI-compat requests and want deterministic
responses.
* address review comments
* address review comments
2026-01-12 19:21:34 +02:00
Aleksander Grygier
0009c0c300
refactor: MCP types and health check
2026-01-12 18:12:08 +01:00
Aleksander Grygier
0180becb8b
chore: update webui build output
2026-01-12 15:26:46 +01:00
Aleksander Grygier
08c1acd1db
refactor: KeyValuePairs component
2026-01-12 15:25:43 +01:00
Aleksander Grygier
392a6dce0d
chore: update webui build output
2026-01-12 15:15:19 +01:00
Aleksander Grygier
a44332b528
refactor: DRY
2026-01-12 15:10:18 +01:00
Aleksander Grygier
80e829a248
chore: update webui build output
2026-01-12 14:49:11 +01:00
Aleksander Grygier
60ef752d0f
refactor: Architecture improvements
2026-01-12 14:45:24 +01:00
Aleksander Grygier
a63a421952
chore: update webui build output
2026-01-12 14:18:15 +01:00
Aleksander Grygier
58ab834b18
refactor: MCP state management + stores/clients relationship
2026-01-12 14:17:06 +01:00
Xuan-Son Nguyen
ce3bf9b1a4
server: update docs for sleeping [no ci] ( #18777 )
2026-01-12 13:01:24 +01:00
Aleksander Grygier
9c53bd4486
chore: update webui build output
2026-01-12 11:16:18 +01:00
Aleksander Grygier
528a560a25
fix: Distinguish streaming vs incomplete tool calls in UI
2026-01-12 11:15:58 +01:00
Aleksander Grygier
aa9054367a
chore: update webui build output
2026-01-12 11:10:24 +01:00
Aleksander Grygier
cead02ee58
fix: Restore live reactive UI progress for tool calls
2026-01-12 11:07:56 +01:00
Aleksander Grygier
c6843d0054
chore: update webui build output
2026-01-12 11:02:42 +01:00
Aleksander Grygier
b5226ebd86
Merge origin/allozaur/mcp-mvp: enable streaming of tool call arguments
...
Resolves conflicts by:
- Keeping clean store architecture (agentic.svelte.ts delegates to client)
- Updating agentic.client.ts to use TOOL_ARGS_START/END format
- Accepting remote AgenticContent.svelte with direct JSON parsing
- Updating ChatMessageAssistant to match new AgenticContent props
2026-01-12 10:55:34 +01:00
Aleksander Grygier
01dfe0ee4c
chore: update webui build output
2026-01-12 10:37:12 +01:00
Aleksander Grygier
144148125b
refactor: Cleanup
2026-01-12 10:28:59 +01:00
Pascal
a02acca38d
fix: reset tool call state between turns
2026-01-10 19:14:13 +01:00
Pascal
b7288a4dd7
webui: enable streaming of tool call arguments
2026-01-10 18:59:57 +01:00
Georgi Gerganov
f307926482
server : adjust unified KV cache tests ( #18716 )
2026-01-10 17:51:56 +02:00
Xuan-Son Nguyen
9ac2693a30
server: fix n_cmpl not skipping processing prompt ( #18663 )
...
* server: fix n_cmpl not skipping processing
* fix infinite loop on empty batch
* cont : init child samplers + modify child logic
* cont : cleanup
* cont : improve n_cmpl logic
- launch the parent task first so it finds the slot with best cache
- parent task waits for child tasks to be launched
- when a child task finishes - remove its cache
* cont : remove redundant function
* cont : reduce parent checks
* fix : nullptr task dereference
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-10 00:00:41 +01:00
Pascal
ec8fd7876b
Webui/file upload ( #18694 )
...
* webui: fix restrictive file type validation
* webui: simplify file processing logic
* chore: update webui build output
* webui: remove file picker extension whitelist (1/2)
* webui: remove file picker extension whitelist (2/2)
* chore: update webui build output
* refactor: Cleanup
* chore: update webui build output
* fix: update ChatForm storybook test after removing accept attribute
* chore: update webui build output
* refactor: more cleanup
* chore: update webui build output
2026-01-09 16:45:32 +01:00
Georgi Gerganov
53eb9435da
server : fix timing of prompt/generation ( #18713 )
2026-01-09 12:59:50 +02:00
Georgi Gerganov
f5f8812f7c
server : use different seeds for child completions ( #18700 )
...
* server : use different seeds for child completions
* cont : handle default seed
* cont : note
2026-01-09 09:33:50 +02:00
Pascal
74b119e81e
webui: prevent mobile dropdown immediate close on synthetic click
2026-01-08 22:48:56 +01:00
Pascal
d000d84201
webui: fix redirect to root ignoring base path
2026-01-08 15:33:23 +01:00
Aleksander Grygier
2c0add6a90
Merge remote-tracking branch 'origin/allozaur/mcp-mvp' into allozaur/mcp-mvp
2026-01-08 15:02:05 +01:00
Aleksander Grygier
e3ca595651
chore: update webui build output
2026-01-08 14:54:45 +01:00
Aleksander Grygier
6f7750489e
refactor: Types
2026-01-08 14:45:47 +01:00
Aleksander Grygier
dfd3031b17
refactor: Componentize McpServerCard
2026-01-08 14:18:30 +01:00
Aleksander Grygier
835c06e0d1
refactor: Cleanup
2026-01-08 14:18:12 +01:00
Aleksander Grygier
ddbb7dc2e5
fix: Remove redundant CSS class
2026-01-08 14:11:52 +01:00
Adrien Gallouët
55abc39355
vendor : update cpp-httplib to 0.30.0 ( #18660 )
...
* vendor : update cpp-httplib to 0.30.0
* common : allow custom headers when downloading
2026-01-08 13:53:54 +01:00
Aleksander Grygier
bf2a793f42
refactor: Cleanup
2026-01-08 13:49:55 +01:00
Aleksander Grygier
089f38230c
feat: Add TruncatedText component
2026-01-08 13:02:46 +01:00
Aleksander Grygier
06febe08b7
fix: Collapsible box trigger
2026-01-08 12:48:15 +01:00
Aleksander Grygier
223c6333e9
refactor: Cleanup
2026-01-08 12:46:10 +01:00
Aleksander Grygier
b0ba550928
refactor: Cleanup
2026-01-08 12:03:36 +01:00
Aleksander Grygier
56b34bf63b
refactor: Collapsible Content Block & small fixes
2026-01-08 09:17:24 +01:00
Aleksander Grygier
d89ada8cee
chore: update webui build output
2026-01-07 15:46:32 +01:00
Aleksander Grygier
98bce85b1f
refactor: Cleanup
2026-01-07 15:44:23 +01:00
Aleksander Grygier
b9adc00d3f
chore: update webui build output
2026-01-07 14:27:48 +01:00
Aleksander Grygier
10e5ad1396
feat: UI improvements
2026-01-07 14:01:27 +01:00
Aleksander Grygier
bc07e0723d
feat: Always show Mcp Selector
2026-01-07 14:01:27 +01:00
Pascal
4c095df509
fix: remove double scrollbar in model selector by using Bits UI content available height
2026-01-07 12:23:03 +01:00
R
3d26a09dc7
server : add thinking content blocks to Anthropic Messages API ( #18551 )
...
* server : add thinking content blocks to Anthropic Messages API
Add support for returning reasoning/thinking content in Anthropic API
responses when using models with --reasoning-format deepseek and the
thinking parameter enabled.
- Non-streaming: adds thinking block before text in content array
- Streaming: emits thinking_delta events with correct block indices
- Partial streaming: tracks reasoning state across chunks via
anthropic_has_reasoning member variable
Tested with bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF model.
* server : fix Anthropic API streaming for thinking content blocks
Add signature field and fix duplicate content_block_start events in
Anthropic Messages API streaming responses for reasoning models.
* server: refactor Anthropic streaming state to avoid raw pointer
Replace raw pointer to task_result_state with direct field copies:
- Copy state fields in update() before processing chunk
- Use local copies in to_json_anthropic() instead of dereferencing
- Pre-compute state updates for next chunk in update()
This makes the data flow clearer and avoids unsafe pointer patterns.
2026-01-06 16:17:13 +01:00
Tarek Dakhran
73d284a250
model : add LFM2-ColBert-350M ( #18607 )
...
* model : add LFM2-ColBert-350M
* llama_model_n_embd_out() - returns `hparams.n_embd_out` if set and fallbacks to `hparams.n_embd`
2026-01-05 19:52:56 +01:00
Aleksander Grygier
2d6020b574
feat: Enable adding System Prompt per-chat
2026-01-05 14:30:11 +01:00
Vladislav Sayapin
da143b9940
server : fix router child env in containerized environments ( #18562 )
2026-01-05 14:12:05 +01:00
Aleksander Grygier
469263668f
fix: UI
2026-01-05 11:59:31 +01:00
Aleksander Grygier
cf37390434
chore: update webui build output
2026-01-05 11:57:23 +01:00
Aleksander Grygier
f3734b5b7c
feat: UI improvements
2026-01-05 11:53:53 +01:00
Pascal
653f85fedd
webui: raw tool result display, strip only leading/trailing newlines to preserve indentation
2026-01-05 09:01:31 +01:00
Pascal
fc7218ae11
webui: split raw output into backend parsing and frontend display options
2026-01-05 09:01:31 +01:00
Pascal
4f9d9d41b9
webui: remove legacy wrapper and restore WebSocket transport
2026-01-05 09:01:31 +01:00
Pascal
183d9eebff
webui: remove unused imports
2026-01-05 09:01:31 +01:00
Aleksander Grygier
f7ea69fa18
chore: update webui build output
2026-01-05 09:01:31 +01:00
Aleksander Grygier
c5d01fbb8f
feat: Improve agentic tool call streaming display with 'in progress' state
2026-01-05 09:01:31 +01:00
Aleksander Grygier
f755673c6f
feat: Enhance MCP server dropdown with search, popularity sorting, and per-chat overrides
2026-01-05 09:01:31 +01:00
Aleksander Grygier
81ad2d5569
feat: Add per-chat MCP server overrides
2026-01-05 09:01:31 +01:00
Aleksander Grygier
865c28a96d
chore: update webui build output
2026-01-05 09:01:31 +01:00
Aleksander Grygier
2592471d11
feat: Add image load error fallback in MarkdownContent
2026-01-05 09:01:31 +01:00
Aleksander Grygier
069be7b517
feat: Implement lazy MCP client shutdown
2026-01-05 09:01:31 +01:00
Aleksander Grygier
9571e07687
feat: Enhance tool call streaming UI and output format
2026-01-05 09:01:31 +01:00
Aleksander Grygier
260375819d
feat: Display and manage servers in ChatForm actions
2026-01-05 09:01:31 +01:00
Aleksander Grygier
74345d8785
feat: Integrate server management dialog into chat settings
2026-01-05 09:01:31 +01:00
Aleksander Grygier
dde5e1582c
feat: Implement dedicated server management UI components
2026-01-05 09:01:31 +01:00
Aleksander Grygier
c24d5e36f0
refactor: Centralize health check logic in store
2026-01-05 09:01:31 +01:00
Aleksander Grygier
f87b10ee66
feat: Enhance server config with headers and schema normalization
2026-01-05 09:01:31 +01:00
Aleksander Grygier
778ad550b1
feat: Add McpLogo Svelte component
2026-01-05 09:01:31 +01:00
Aleksander Grygier
c1c2234a62
refactor: Consolidate UI CSS classes into shared module
2026-01-05 09:01:31 +01:00
Aleksander Grygier
883d2a4f15
chore: update webui build output
2026-01-05 09:01:31 +01:00
Aleksander Grygier
7d5fd37324
feat: Raw LLM output switch per message
2026-01-05 09:01:31 +01:00
Aleksander Grygier
03464a0780
refactor: Tool call handling
2026-01-05 09:01:31 +01:00
Aleksander Grygier
3e7318f09d
docs: Update high-level architecture diagrams for MCP integration
2026-01-05 09:01:15 +01:00