Commit Graph

459 Commits

Author SHA1 Message Date
Pascal 78c6380222 refactor: remove reasoning after first turn filter 2026-01-16 15:19:50 +01:00
Pascal 2973c64609 refactor: inline reasoning with tags, remove fixed thinking field 2026-01-16 15:19:42 +01:00
Xuan-Son Nguyen c15395f73c
common : implement new jinja template engine (#18462)
* jinja vm

* lexer

* add vm types

* demo

* clean up

* parser ok

* binary_expression::execute

* shadow naming

* bin ops works!

* fix map object

* add string builtins

* add more builtins

* wip

* use mk_val

* eval with is_user_input

* render gemma tmpl ok

* track input string even after transformations

* support binded functions

* keyword arguments and slicing array

* use shared_ptr for values

* add mk_stmt

* allow print source on exception

* fix negate test

* testing more templates

* mostly works

* add filter_statement

* allow func to access ctx

* add jinja-value.cpp

* impl global_from_json

* a lot of fixes

* more tests

* more fix, more tests

* more fixes

* rm workarounds

* demo: type inferrence

* add placeholder for tojson

* improve function args handling

* rm type inference

* no more std::regex

* trailing spaces

* make testing more flexible

* make output a bit cleaner

* (wip) redirect minja calls

* test: add --output

* fix crash on macro kwargs

* add minimal caps system

* add some workarounds

* rm caps_apply_workarounds

* get rid of preprocessing

* more fixes

* fix test-chat-template

* move test-chat-jinja into test-chat-template

* rm test-chat-jinja from cmake

* test-chat-template: use common

* fix build

* fix build (2)

* rename vm --> interpreter

* improve error reporting

* correct lstrip behavior

* add tojson

* more fixes

* disable tests for COMMON_CHAT_FORMAT_GENERIC

* make sure tojson output correct order

* add object.length

* fully functional selectattr / rejectattr

* improve error reporting

* more builtins added, more fixes

* create jinja rendering tests

* fix testing.h path

* adjust whitespace rules

* more fixes

* temporary disable test for ibm-granite

* r/lstrip behavior matched with hf.js

* minimax, glm4.5 ok

* add append and pop

* kimi-k2 ok

* test-chat passed

* fix lstrip_block

* add more jinja tests

* cast to unsigned char

* allow dict key to be numeric

* nemotron: rm windows newline

* tests ok

* fix test

* rename interpreter --> runtime

* fix build

* add more checks

* bring back generic format support

* fix Apertus

* [json.exception.out_of_range.403] key 'content' not found

* rm generic test

* refactor input marking

* add docs

* fix windows build

* clarify error message

* improved tests

* split/rsplit with maxsplit

* non-inverse maxsplit

forgot to change after simplifying

* implement separators for tojson and fix indent

* i like to move it move it

* rename null -- > none

* token::eof

* some nits + comments

* add exception classes for lexer and parser

* null -> none

* rename global -> env

* rm minja

* update docs

* docs: add input marking caveats

* imlement missing jinja-tests functions

* oops

* support trim filter with args, remove bogus to_json reference

* numerous argument fixes

* updated tests

* implement optional strip chars parameter

* use new chars parameter

* float filter also has default

* always leave at least one decimal in float string

* jinja : static analysis + header cleanup + minor fixes

* add fuzz test

* add string.cpp

* fix chat_template_kwargs

* nits

* fix build

* revert

* unrevert

sorry :)

* add fuzz func_args, refactor to be safer

* fix array.map()

* loosen ensure_vals max count condition, add not impl for map(int)

* hopefully fix windows

* check if empty first

* normalize newlines

---------

Co-authored-by: Alde Rojas <hello@alde.dev>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-16 11:22:06 +01:00
Pascal a1550ab77d chore: update webui build output 2026-01-16 11:02:17 +01:00
Pascal db37b712b2 feat: resolve MCP attachment images via rehype plugin
LLM can reference tool-generated images using markdown links like,
plugin resolves attachment names to base64 from message.extra when present,
regular HTTP/data URLs pass through unchanged (no regression)

- rehypeResolveAttachmentImages plugin in markdown pipeline
- Pass message prop to MarkdownContent and AgenticContent
- Force processor reactivity on message.extra changes
- Filter assistant images from API context (display-only)
2026-01-16 10:49:28 +01:00
Pascal a3c2144c1d feat: persist base64 attachments from tool results 2026-01-16 08:07:20 +01:00
Pascal a377605f60 webui: fix custom headers persistence in UI (derived) 2026-01-15 20:36:14 +01:00
Pascal 3360f60b94 webui: fix custom headers persistence in UI 2026-01-15 20:13:01 +01:00
ddh0 13f1e4a9ca
llama : add adaptive-p sampler (#17927)
* initial commit for branch

* simplify constants

* add params to `struct common_params_sampling`, add reference to PR

* explicitly clamp `min_target` and `max_target` to `[0.0, 1.0]`

* add args, rename `queue_size` -> `window_size`

* improved comments

* minor

* remove old unused code from algorithm

* minor

* add power law case to `common_sampler_init`, add sampler name mappings

* clarify behaviour when `window_size = 0`

* add missing enums

* remove `target_range` param, make `target == 1` no-op, cleanup code

* oops, straggler

* add missing parameters in `server-task.cpp`

* copy from author

ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

* remove old debug log, style nit

* fix compiler warning, add commented-out logging per token

* re-write + change parameters + simplify

* oops forgot args.cpp

* fix leftover `window_size`

* add missing values to `common_params_sampling::print()`

* with logging

* does this fix it?

* no, but does this?

* update default decay

* optimize

* fix bad merge

my git skills are lacking

* silence `missing initializer for member`

* update default decay to 0.9

* fix logging

* format (double)

* add power law to the new `samplers` vector

* log sampler init values

* improve logging messages in llama_sampler_power_law

* remove extraneous logging

* simplify target computation

last commit with debug logging!

* remove debug logging, explicitly clamp params at init

* add `use_power_law` flag + logic, minor cleanup

* update `power-law` -> `adaptive-p`

* fix cold start EMA

- `ctx->weighted_sum` is now initialized and reset to `target / (1.0f -
clamped_decay)`
- `ctx->total_weight` is now initialized and reset to `1.0f / (1.0f -
clamped_decay)`

this fixes a "cold start" problem with the moving average

* update `SHARPNESS` constant to `10.0f`

* minor style fixes

no functional changes

* minor style fixes cont.

* update `llama_sampler_adaptive_p_i` for backend sampling (ref: #17004)

* separate into `apply` + `accept` functions

* `pending_token_idx`: switch from `llama_token` to `int32`

functionally identical (`llama.h` has `typedef int32_t llama_token;`),
but its more correct now

* don't transform logits <= -1e9f

* fix masking in backend top-p, min-p

* address review comments

* typo in comments `RND` -> `RNG`

* add docs

* add recommended values in completion docs

* address PR feedback

* remove trailing whitespace (for CI `editorconfig`)

* add to adaptive-p to `common_sampler_types_from_chars`
2026-01-15 19:16:29 +02:00
Aleksander Grygier cffc3b46ae fix: Word wrapping 2026-01-15 17:59:57 +01:00
Xuan-Son Nguyen a04c2b06a3
server: improve slots scheduling for n_cmpl (#18789)
* server : make sure children tasks are scheduled to launch with parent

* fix

* add comment pointing to this PR

* fix

* clean up

* more debug messages

* add pop_deferred_task with specific ID version

* improve the logic

* simple approach

* no double move

* correct return type of launch_slots_with_parent_task
2026-01-15 17:10:28 +01:00
Georgi Gerganov 39173bcacb
context : reserve new scheduler when graph topology changes (#18547)
* context : reserve new scheduler when graph topology changes

* cont : fix

* cont : fix reserve

* cont : reserve only when changes occur + timing

* context : add comments

* llama : reserve on sampler changes

* common : allow null common_sampler

* server : task declares needs (embd, logits, sampling)

* server : do not init sampler if not needed

* llama : fix need_reserve when unsetting a sampler

* server : consolidate slot reset/clear logic
2026-01-15 16:39:17 +02:00
Aleksander Grygier 5417a439ef chore: update webui build output 2026-01-15 11:39:10 +01:00
Aleksander Grygier 30a585bb96 feat: UI improvements 2026-01-14 17:32:57 +01:00
Aleksander Grygier 886939c550 chore: update webui build output 2026-01-14 14:39:32 +01:00
Aleksander Grygier 39848ee12f feat: UI improvement 2026-01-14 14:26:41 +01:00
Aleksander Grygier c1ac8d7326 chore: update webui build output 2026-01-14 13:22:01 +01:00
Aleksander Grygier afdae742e3 Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp 2026-01-14 13:20:25 +01:00
Aleksander Grygier b11b32ea28 chore: update webui build output 2026-01-14 12:47:13 +01:00
Aleksander Grygier 06efeb6eb9 chore: update webui build output 2026-01-14 11:49:26 +01:00
Aleksander Grygier f89bcb90ca feat: MCP Server Details 2026-01-14 11:45:47 +01:00
Aleksander Grygier 120f3c978c chore: update webui build output 2026-01-12 18:27:54 +01:00
Aleksander Grygier 5407b2efab feat: MCP connection details WIP 2026-01-12 18:26:48 +01:00
Radoslav Gerganov bcf7546160
server : add arg for disabling prompt caching (#18776)
* server : add arg for disabling prompt caching

Disabling prompt caching is useful for clients who are restricted to
sending only OpenAI-compat requests and want deterministic
responses.

* address review comments

* address review comments
2026-01-12 19:21:34 +02:00
Aleksander Grygier 0009c0c300 refactor: MCP types and health check 2026-01-12 18:12:08 +01:00
Aleksander Grygier 0180becb8b chore: update webui build output 2026-01-12 15:26:46 +01:00
Aleksander Grygier 08c1acd1db refactor: KeyValuePairs component 2026-01-12 15:25:43 +01:00
Aleksander Grygier 392a6dce0d chore: update webui build output 2026-01-12 15:15:19 +01:00
Aleksander Grygier a44332b528 refactor: DRY 2026-01-12 15:10:18 +01:00
Aleksander Grygier 80e829a248 chore: update webui build output 2026-01-12 14:49:11 +01:00
Aleksander Grygier 60ef752d0f refactor: Architecture improvements 2026-01-12 14:45:24 +01:00
Aleksander Grygier a63a421952 chore: update webui build output 2026-01-12 14:18:15 +01:00
Aleksander Grygier 58ab834b18 refactor: MCP state management + stores/clients relationship 2026-01-12 14:17:06 +01:00
Xuan-Son Nguyen ce3bf9b1a4
server: update docs for sleeping [no ci] (#18777) 2026-01-12 13:01:24 +01:00
Aleksander Grygier 9c53bd4486 chore: update webui build output 2026-01-12 11:16:18 +01:00
Aleksander Grygier 528a560a25 fix: Distinguish streaming vs incomplete tool calls in UI 2026-01-12 11:15:58 +01:00
Aleksander Grygier aa9054367a chore: update webui build output 2026-01-12 11:10:24 +01:00
Aleksander Grygier cead02ee58 fix: Restore live reactive UI progress for tool calls 2026-01-12 11:07:56 +01:00
Aleksander Grygier c6843d0054 chore: update webui build output 2026-01-12 11:02:42 +01:00
Aleksander Grygier b5226ebd86 Merge origin/allozaur/mcp-mvp: enable streaming of tool call arguments
Resolves conflicts by:
- Keeping clean store architecture (agentic.svelte.ts delegates to client)
- Updating agentic.client.ts to use TOOL_ARGS_START/END format
- Accepting remote AgenticContent.svelte with direct JSON parsing
- Updating ChatMessageAssistant to match new AgenticContent props
2026-01-12 10:55:34 +01:00
Aleksander Grygier 01dfe0ee4c chore: update webui build output 2026-01-12 10:37:12 +01:00
Aleksander Grygier 144148125b refactor: Cleanup 2026-01-12 10:28:59 +01:00
Pascal a02acca38d fix: reset tool call state between turns 2026-01-10 19:14:13 +01:00
Pascal b7288a4dd7 webui: enable streaming of tool call arguments 2026-01-10 18:59:57 +01:00
Georgi Gerganov f307926482
server : adjust unified KV cache tests (#18716) 2026-01-10 17:51:56 +02:00
Xuan-Son Nguyen 9ac2693a30
server: fix n_cmpl not skipping processing prompt (#18663)
* server: fix n_cmpl not skipping processing

* fix infinite loop on empty batch

* cont : init child samplers + modify child logic

* cont : cleanup

* cont : improve n_cmpl logic

- launch the parent task first so it finds the slot with best cache
- parent task waits for child tasks to be launched
- when a child task finishes - remove its cache

* cont : remove redundant function

* cont : reduce parent checks

* fix : nullptr task dereference

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-10 00:00:41 +01:00
Pascal ec8fd7876b
Webui/file upload (#18694)
* webui: fix restrictive file type validation

* webui: simplify file processing logic

* chore: update webui build output

* webui: remove file picker extension whitelist (1/2)

* webui: remove file picker extension whitelist (2/2)

* chore: update webui build output

* refactor: Cleanup

* chore: update webui build output

* fix: update ChatForm storybook test after removing accept attribute

* chore: update webui build output

* refactor: more cleanup

* chore: update webui build output
2026-01-09 16:45:32 +01:00
Georgi Gerganov 53eb9435da
server : fix timing of prompt/generation (#18713) 2026-01-09 12:59:50 +02:00
Georgi Gerganov f5f8812f7c
server : use different seeds for child completions (#18700)
* server : use different seeds for child completions

* cont : handle default seed

* cont : note
2026-01-09 09:33:50 +02:00
Pascal 74b119e81e webui: prevent mobile dropdown immediate close on synthetic click 2026-01-08 22:48:56 +01:00
Pascal d000d84201 webui: fix redirect to root ignoring base path 2026-01-08 15:33:23 +01:00
Aleksander Grygier 2c0add6a90 Merge remote-tracking branch 'origin/allozaur/mcp-mvp' into allozaur/mcp-mvp 2026-01-08 15:02:05 +01:00
Aleksander Grygier e3ca595651 chore: update webui build output 2026-01-08 14:54:45 +01:00
Aleksander Grygier 6f7750489e refactor: Types 2026-01-08 14:45:47 +01:00
Aleksander Grygier dfd3031b17 refactor: Componentize McpServerCard 2026-01-08 14:18:30 +01:00
Aleksander Grygier 835c06e0d1 refactor: Cleanup 2026-01-08 14:18:12 +01:00
Aleksander Grygier ddbb7dc2e5 fix: Remove redundant CSS class 2026-01-08 14:11:52 +01:00
Adrien Gallouët 55abc39355
vendor : update cpp-httplib to 0.30.0 (#18660)
* vendor : update cpp-httplib to 0.30.0
* common : allow custom headers when downloading
2026-01-08 13:53:54 +01:00
Aleksander Grygier bf2a793f42
refactor: Cleanup 2026-01-08 13:49:55 +01:00
Aleksander Grygier 089f38230c feat: Add TruncatedText component 2026-01-08 13:02:46 +01:00
Aleksander Grygier 06febe08b7 fix: Collapsible box trigger 2026-01-08 12:48:15 +01:00
Aleksander Grygier 223c6333e9 refactor: Cleanup 2026-01-08 12:46:10 +01:00
Aleksander Grygier b0ba550928 refactor: Cleanup 2026-01-08 12:03:36 +01:00
Aleksander Grygier 56b34bf63b refactor: Collapsible Content Block & small fixes 2026-01-08 09:17:24 +01:00
Aleksander Grygier d89ada8cee chore: update webui build output 2026-01-07 15:46:32 +01:00
Aleksander Grygier 98bce85b1f refactor: Cleanup 2026-01-07 15:44:23 +01:00
Aleksander Grygier b9adc00d3f chore: update webui build output 2026-01-07 14:27:48 +01:00
Aleksander Grygier 10e5ad1396 feat: UI improvements 2026-01-07 14:01:27 +01:00
Aleksander Grygier bc07e0723d feat: Always show Mcp Selector 2026-01-07 14:01:27 +01:00
Pascal 4c095df509 fix: remove double scrollbar in model selector by using Bits UI content available height 2026-01-07 12:23:03 +01:00
R 3d26a09dc7
server : add thinking content blocks to Anthropic Messages API (#18551)
* server : add thinking content blocks to Anthropic Messages API

Add support for returning reasoning/thinking content in Anthropic API
responses when using models with --reasoning-format deepseek and the
thinking parameter enabled.

- Non-streaming: adds thinking block before text in content array
- Streaming: emits thinking_delta events with correct block indices
- Partial streaming: tracks reasoning state across chunks via
  anthropic_has_reasoning member variable

Tested with bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF model.

* server : fix Anthropic API streaming for thinking content blocks

Add signature field and fix duplicate content_block_start events in
Anthropic Messages API streaming responses for reasoning models.

* server: refactor Anthropic streaming state to avoid raw pointer

Replace raw pointer to task_result_state with direct field copies:
- Copy state fields in update() before processing chunk
- Use local copies in to_json_anthropic() instead of dereferencing
- Pre-compute state updates for next chunk in update()

This makes the data flow clearer and avoids unsafe pointer patterns.
2026-01-06 16:17:13 +01:00
Tarek Dakhran 73d284a250
model : add LFM2-ColBert-350M (#18607)
* model : add LFM2-ColBert-350M

* llama_model_n_embd_out() - returns `hparams.n_embd_out` if set and fallbacks to `hparams.n_embd`
2026-01-05 19:52:56 +01:00
Aleksander Grygier 2d6020b574 feat: Enable adding System Prompt per-chat 2026-01-05 14:30:11 +01:00
Vladislav Sayapin da143b9940
server : fix router child env in containerized environments (#18562) 2026-01-05 14:12:05 +01:00
Aleksander Grygier 469263668f fix: UI 2026-01-05 11:59:31 +01:00
Aleksander Grygier cf37390434 chore: update webui build output 2026-01-05 11:57:23 +01:00
Aleksander Grygier f3734b5b7c feat: UI improvements 2026-01-05 11:53:53 +01:00
Pascal 653f85fedd webui: raw tool result display, strip only leading/trailing newlines to preserve indentation 2026-01-05 09:01:31 +01:00
Pascal fc7218ae11 webui: split raw output into backend parsing and frontend display options 2026-01-05 09:01:31 +01:00
Pascal 4f9d9d41b9 webui: remove legacy wrapper and restore WebSocket transport 2026-01-05 09:01:31 +01:00
Pascal 183d9eebff webui: remove unused imports 2026-01-05 09:01:31 +01:00
Aleksander Grygier f7ea69fa18 chore: update webui build output 2026-01-05 09:01:31 +01:00
Aleksander Grygier c5d01fbb8f feat: Improve agentic tool call streaming display with 'in progress' state 2026-01-05 09:01:31 +01:00
Aleksander Grygier f755673c6f feat: Enhance MCP server dropdown with search, popularity sorting, and per-chat overrides 2026-01-05 09:01:31 +01:00
Aleksander Grygier 81ad2d5569 feat: Add per-chat MCP server overrides 2026-01-05 09:01:31 +01:00
Aleksander Grygier 865c28a96d chore: update webui build output 2026-01-05 09:01:31 +01:00
Aleksander Grygier 2592471d11 feat: Add image load error fallback in MarkdownContent 2026-01-05 09:01:31 +01:00
Aleksander Grygier 069be7b517 feat: Implement lazy MCP client shutdown 2026-01-05 09:01:31 +01:00
Aleksander Grygier 9571e07687 feat: Enhance tool call streaming UI and output format 2026-01-05 09:01:31 +01:00
Aleksander Grygier 260375819d feat: Display and manage servers in ChatForm actions 2026-01-05 09:01:31 +01:00
Aleksander Grygier 74345d8785 feat: Integrate server management dialog into chat settings 2026-01-05 09:01:31 +01:00
Aleksander Grygier dde5e1582c feat: Implement dedicated server management UI components 2026-01-05 09:01:31 +01:00
Aleksander Grygier c24d5e36f0 refactor: Centralize health check logic in store 2026-01-05 09:01:31 +01:00
Aleksander Grygier f87b10ee66 feat: Enhance server config with headers and schema normalization 2026-01-05 09:01:31 +01:00
Aleksander Grygier 778ad550b1 feat: Add McpLogo Svelte component 2026-01-05 09:01:31 +01:00
Aleksander Grygier c1c2234a62 refactor: Consolidate UI CSS classes into shared module 2026-01-05 09:01:31 +01:00
Aleksander Grygier 883d2a4f15 chore: update webui build output 2026-01-05 09:01:31 +01:00
Aleksander Grygier 7d5fd37324 feat: Raw LLM output switch per message 2026-01-05 09:01:31 +01:00
Aleksander Grygier 03464a0780 refactor: Tool call handling 2026-01-05 09:01:31 +01:00
Aleksander Grygier 3e7318f09d docs: Update high-level architecture diagrams for MCP integration 2026-01-05 09:01:15 +01:00