Commit Graph

564 Commits

Author SHA1 Message Date
Aleksander Grygier 39848ee12f feat: UI improvement 2026-01-14 14:26:41 +01:00
Aleksander Grygier c1ac8d7326 chore: update webui build output 2026-01-14 13:22:01 +01:00
Aleksander Grygier afdae742e3 Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp 2026-01-14 13:20:25 +01:00
Aleksander Grygier b11b32ea28 chore: update webui build output 2026-01-14 12:47:13 +01:00
Aleksander Grygier 06efeb6eb9 chore: update webui build output 2026-01-14 11:49:26 +01:00
Aleksander Grygier f89bcb90ca feat: MCP Server Details 2026-01-14 11:45:47 +01:00
Xuan-Son Nguyen e047f9ee9d
mtmd: fix use_non_causal being reported incorrectly (#18793)
* mtmd: fix use_non_causal being reported incorrectly

* move clip_is_mrope to mtmd_decode_use_mrope

* fix sloppy code ggml_cpy
2026-01-13 12:19:38 +01:00
Ruben Ortlam db79dc06b1
llama-bench: add direct_io parameter (#18778) 2026-01-13 08:49:10 +01:00
Aleksander Grygier 120f3c978c chore: update webui build output 2026-01-12 18:27:54 +01:00
Aleksander Grygier 5407b2efab feat: MCP connection details WIP 2026-01-12 18:26:48 +01:00
Radoslav Gerganov bcf7546160
server : add arg for disabling prompt caching (#18776)
* server : add arg for disabling prompt caching

Disabling prompt caching is useful for clients who are restricted to
sending only OpenAI-compat requests and want deterministic
responses.

* address review comments

* address review comments
2026-01-12 19:21:34 +02:00
Aleksander Grygier 0009c0c300 refactor: MCP types and health check 2026-01-12 18:12:08 +01:00
Aleksander Grygier 0180becb8b chore: update webui build output 2026-01-12 15:26:46 +01:00
Aleksander Grygier 08c1acd1db refactor: KeyValuePairs component 2026-01-12 15:25:43 +01:00
Aleksander Grygier 392a6dce0d chore: update webui build output 2026-01-12 15:15:19 +01:00
Aleksander Grygier a44332b528 refactor: DRY 2026-01-12 15:10:18 +01:00
Aleksander Grygier 80e829a248 chore: update webui build output 2026-01-12 14:49:11 +01:00
Aleksander Grygier 60ef752d0f refactor: Architecture improvements 2026-01-12 14:45:24 +01:00
Aleksander Grygier a63a421952 chore: update webui build output 2026-01-12 14:18:15 +01:00
Aleksander Grygier 58ab834b18 refactor: MCP state management + stores/clients relationship 2026-01-12 14:17:06 +01:00
Xuan-Son Nguyen ce3bf9b1a4
server: update docs for sleeping [no ci] (#18777) 2026-01-12 13:01:24 +01:00
Aleksander Grygier 9c53bd4486 chore: update webui build output 2026-01-12 11:16:18 +01:00
Aleksander Grygier 528a560a25 fix: Distinguish streaming vs incomplete tool calls in UI 2026-01-12 11:15:58 +01:00
Aleksander Grygier aa9054367a chore: update webui build output 2026-01-12 11:10:24 +01:00
Aleksander Grygier cead02ee58 fix: Restore live reactive UI progress for tool calls 2026-01-12 11:07:56 +01:00
Aleksander Grygier c6843d0054 chore: update webui build output 2026-01-12 11:02:42 +01:00
Aleksander Grygier b5226ebd86 Merge origin/allozaur/mcp-mvp: enable streaming of tool call arguments
Resolves conflicts by:
- Keeping clean store architecture (agentic.svelte.ts delegates to client)
- Updating agentic.client.ts to use TOOL_ARGS_START/END format
- Accepting remote AgenticContent.svelte with direct JSON parsing
- Updating ChatMessageAssistant to match new AgenticContent props
2026-01-12 10:55:34 +01:00
Aleksander Grygier 01dfe0ee4c chore: update webui build output 2026-01-12 10:37:12 +01:00
Aleksander Grygier 144148125b refactor: Cleanup 2026-01-12 10:28:59 +01:00
Pascal a02acca38d fix: reset tool call state between turns 2026-01-10 19:14:13 +01:00
Pascal b7288a4dd7 webui: enable streaming of tool call arguments 2026-01-10 18:59:57 +01:00
Georgi Gerganov f307926482
server : adjust unified KV cache tests (#18716) 2026-01-10 17:51:56 +02:00
Xuan-Son Nguyen 9ac2693a30
server: fix n_cmpl not skipping processing prompt (#18663)
* server: fix n_cmpl not skipping processing

* fix infinite loop on empty batch

* cont : init child samplers + modify child logic

* cont : cleanup

* cont : improve n_cmpl logic

- launch the parent task first so it finds the slot with best cache
- parent task waits for child tasks to be launched
- when a child task finishes - remove its cache

* cont : remove redundant function

* cont : reduce parent checks

* fix : nullptr task dereference

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-10 00:00:41 +01:00
Simranjeet Singh a61c8bc3bf
mtmd: Add Gemma3n multimodal support with MobileNetV5 vision encoder (#18256)
* Add Gemma3nVisionModel - MobileNetV5 vision encoder convertor to convert_hf_to_gguf.py. Add gemma3n to vision projectors in gguf-py/gguf/constants.py.

* Add mobilenetv5 impl

* Fix comments, remove unused vars

* Fix permute and remove transpose of projection weights

* Fix comments, remove debugging prints from hf_to_gguf

* 1. Hard-code image_mean = 0 and image_std = 1
2. Use available tensor mapping logic
3. Remove redundant chat template replacement of soft tokens placeholder with media placeholder

* 1. Move mobilenetv5 helpers declarations to `clip_graph_mobilenetv5` struct and definitions to mobilenetv5.cpp
2.Remove unused `clip_is_gemma3n` func declarations and definitions
3. Remove redundant `rescale_image_u8_to_f32` func and use `normalize_image_u8_to_f32` with zero mean and unit std
4. Calculate n_patches using image_size / patch_size

* Remove obsolete comments

* - convert_hf_to_gguf.py & constants.py & tensor_mapping.py: Use explicit mapping: Custom map for double indexed blocks and tensor_mapping.py for rest
- convert_hf_to_gguf.py: Unsqueeze Stem Bias and Layer scale tensors to correct shape while converting to gguf
- mobilenetv5.cpp: Remove explicit reshaping of Stem Bias and Layer scale which are now handled while converting to gguf, replace fprintf with LOG_*
- clip.cpp: Remove unused embedding and hard_emb_norm tensor loading

* - Rename tensors to v.conv..., v.blk..., v.msfa... to better align with already existing terminology

* Fix stem conv bias name

* Remove explicit handling of bias term for stem conv

* - Change order of addition in "project_per_layer_inputs" to support broadcasting of vision inp_per_layer
- Simplify the vision embeddings path of "get_per_layer_inputs" to output [n_embd_altup, n_layer, 1], broadcastable

* clean up conversion script

* fix code style

* also preserve audio tensors

* trailing space

* split arch A and V

* rm unused gemma3 func

* fix alignment

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-01-09 23:42:38 +01:00
Pascal ec8fd7876b
Webui/file upload (#18694)
* webui: fix restrictive file type validation

* webui: simplify file processing logic

* chore: update webui build output

* webui: remove file picker extension whitelist (1/2)

* webui: remove file picker extension whitelist (2/2)

* chore: update webui build output

* refactor: Cleanup

* chore: update webui build output

* fix: update ChatForm storybook test after removing accept attribute

* chore: update webui build output

* refactor: more cleanup

* chore: update webui build output
2026-01-09 16:45:32 +01:00
Asbjørn Olling a180ba78c7
cmake: only build cli when server is enabled (#18670) 2026-01-09 16:43:26 +01:00
Georgi Gerganov 53eb9435da
server : fix timing of prompt/generation (#18713) 2026-01-09 12:59:50 +02:00
Georgi Gerganov f5f8812f7c
server : use different seeds for child completions (#18700)
* server : use different seeds for child completions

* cont : handle default seed

* cont : note
2026-01-09 09:33:50 +02:00
Pascal 74b119e81e webui: prevent mobile dropdown immediate close on synthetic click 2026-01-08 22:48:56 +01:00
Pascal d000d84201 webui: fix redirect to root ignoring base path 2026-01-08 15:33:23 +01:00
Aleksander Grygier 2c0add6a90 Merge remote-tracking branch 'origin/allozaur/mcp-mvp' into allozaur/mcp-mvp 2026-01-08 15:02:05 +01:00
Aleksander Grygier e3ca595651 chore: update webui build output 2026-01-08 14:54:45 +01:00
Aleksander Grygier 6f7750489e refactor: Types 2026-01-08 14:45:47 +01:00
Aleksander Grygier dfd3031b17 refactor: Componentize McpServerCard 2026-01-08 14:18:30 +01:00
Aleksander Grygier 835c06e0d1 refactor: Cleanup 2026-01-08 14:18:12 +01:00
Aleksander Grygier ddbb7dc2e5 fix: Remove redundant CSS class 2026-01-08 14:11:52 +01:00
Adrien Gallouët 55abc39355
vendor : update cpp-httplib to 0.30.0 (#18660)
* vendor : update cpp-httplib to 0.30.0
* common : allow custom headers when downloading
2026-01-08 13:53:54 +01:00
Aleksander Grygier bf2a793f42
refactor: Cleanup 2026-01-08 13:49:55 +01:00
Aleksander Grygier 089f38230c feat: Add TruncatedText component 2026-01-08 13:02:46 +01:00
Aleksander Grygier 06febe08b7 fix: Collapsible box trigger 2026-01-08 12:48:15 +01:00