Commit Graph

7839 Commits

Author SHA1 Message Date
Pascal 5c28b7a2ee chore: update webui build output 2026-01-17 18:38:50 +01:00
Pascal fca7177eae fix: ignore assistant attachments (MCP) for modality detection 2026-01-17 18:36:41 +01:00
Pascal 3572667788 chore: update webui build output 2026-01-17 16:35:54 +01:00
Pascal 506da17931 refactor: eliminate MCP circular dependency
- Change architecture from mcpStore <-> mcpClient to mcpClient -> mcpStore
- Remove bidirectional callback pattern (set*Callback, notify* methods)
- Add updateState/updateHealthCheck public methods in mcpStore
- Replace callback calls with direct mcpStore method calls
- Remove unused imports (browser, HealthCheckState) and constructor
- Fixes CI: ReferenceError Cannot access mcpClient before initialization
2026-01-17 16:30:42 +01:00
Pascal 9b3417703f fix: remove obsolete modality UI tests causing CI failures
- Remove VisionModality/AudioModality test stories
- Remove mockServerProps usage and imports
- Simplify Default test (remove dropdown interaction checks)
- Simplify FileAttachments test (remove mocks)
2026-01-17 16:30:36 +01:00
Pascal a723238245 chore: update webui build output 2026-01-16 19:52:23 +01:00
Pascal 229aba7c3e fix: strip reasoning content and UI proprietary tags from prompts
TODO: add toggle and ensure backend API compliance for reasoning format
2026-01-16 19:50:36 +01:00
Pascal f09395821b chore: update webui build output 2026-01-16 15:22:46 +01:00
Pascal 78c6380222 refactor: remove reasoning after first turn filter 2026-01-16 15:19:50 +01:00
Pascal 2973c64609 refactor: inline reasoning with tags, remove fixed thinking field 2026-01-16 15:19:42 +01:00
Pascal a1550ab77d chore: update webui build output 2026-01-16 11:02:17 +01:00
Pascal db37b712b2 feat: resolve MCP attachment images via rehype plugin
LLM can reference tool-generated images using markdown links like,
plugin resolves attachment names to base64 from message.extra when present,
regular HTTP/data URLs pass through unchanged (no regression)

- rehypeResolveAttachmentImages plugin in markdown pipeline
- Pass message prop to MarkdownContent and AgenticContent
- Force processor reactivity on message.extra changes
- Filter assistant images from API context (display-only)
2026-01-16 10:49:28 +01:00
Pascal a3c2144c1d feat: persist base64 attachments from tool results 2026-01-16 08:07:20 +01:00
Pascal a377605f60 webui: fix custom headers persistence in UI (derived) 2026-01-15 20:36:14 +01:00
Pascal 3360f60b94 webui: fix custom headers persistence in UI 2026-01-15 20:13:01 +01:00
Aleksander Grygier cffc3b46ae fix: Word wrapping 2026-01-15 17:59:57 +01:00
Aleksander Grygier 5417a439ef chore: update webui build output 2026-01-15 11:39:10 +01:00
Aleksander Grygier 30a585bb96 feat: UI improvements 2026-01-14 17:32:57 +01:00
Aleksander Grygier 886939c550 chore: update webui build output 2026-01-14 14:39:32 +01:00
Aleksander Grygier 39848ee12f feat: UI improvement 2026-01-14 14:26:41 +01:00
Aleksander Grygier c1ac8d7326 chore: update webui build output 2026-01-14 13:22:01 +01:00
Aleksander Grygier afdae742e3 Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp 2026-01-14 13:20:25 +01:00
Aleksander Grygier b11b32ea28 chore: update webui build output 2026-01-14 12:47:13 +01:00
Aleksander Grygier 06efeb6eb9 chore: update webui build output 2026-01-14 11:49:26 +01:00
Aleksander Grygier f89bcb90ca feat: MCP Server Details 2026-01-14 11:45:47 +01:00
Jeff Bolz 3e4bb29666
vulkan: Check maxStorageBufferRange in supports_op (#18709)
* vulkan: Check maxStorageBufferRange in supports_op

* skip maxStorageBufferRange check when shader64BitIndexing is enabled
2026-01-14 10:59:05 +01:00
Aman Gupta 47f9612492
llama-model: fix unfortunate typo (#18832) 2026-01-14 17:55:15 +08:00
Daniel Bevenius 01cbdfd7eb
CUDA : fix typo in clang pragma comment [no ci] (#18830) 2026-01-14 10:31:49 +01:00
Ruben Ortlam 635ef78ec5
vulkan: work around Intel fp16 bug in mmq (#18814) 2026-01-14 09:41:23 +01:00
Perry Naseck 7d587e5544
ggml-metal: do not copy headers for embedded, use current binary dir for embedded (#18705) 2026-01-14 09:22:25 +02:00
Daniel Benjaminsson d34aa07193
mmap: add Haiku support by skipping RLIMIT_MEMLOCK check (#18819)
Haiku OS does not support RLIMIT_MEMLOCK, similar to visionOS/tvOS.
Skip the resource limit check on Haiku to allow mlock functionality
to work without compile errors.

Tested on Haiku with NVIDIA RTX 3080 Ti using Vulkan backend.
2026-01-14 09:11:05 +02:00
Adrien Gallouët f709c7a33f
ci, tests : use cmake to download models and remove libcurl dependency (#18791)
* ci, tests : use cmake to download models and remove libcurl dependency
* llama_dl_model -> llama_download_model
* use EXPECTED_HASH for robust model downloading
* Move llama_download_model to cmake/common.cmake

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-14 07:46:27 +01:00
ddh0 6e36299b47
llama : print_info alignment fix (#18708)
* fix text spacing in print_info

* align all
2026-01-14 00:05:11 +01:00
Junwon Hwang 60591f01d4
model : add EXAONE MoE (#18543)
* Add EXAONE MoE implementations

Co-authored-by: Junwon Hwang <nuclear1221@gmail.com>

* Address PR feedback

* Address PR feedback

* [WIP] Add MTP for EXAONE-MoE

* Address PR feedback

* Address PR feedback

* Address PR feedback

* Address PR feedback

* Address PR feedback

* Address PR feedback

* Address PR feedback

---------

Co-authored-by: LG-AI-EXAONE <exaonemodels@lgresearch.ai>
2026-01-13 23:28:38 +01:00
Georgi Gerganov e4832e3ae4
vocab : fix attribute overrides for harmony (#18806)
* vocab : fix attribute overrides for harmony

* cont : add warning log
2026-01-13 17:40:13 +02:00
Ruben Ortlam 960e5e3b46
llama-mmap: fix direct-io loading fallback EOF exception (#18801) 2026-01-13 15:57:07 +01:00
Daniel Bevenius 20ca2e12c4
model-conversion : remove -c 0 from model card template [no ci] (#18807)
This commit removes the `-c, --ctx-size N` from the llama-server
command in the model card template for causal models.

The motivation for this is that -c 0 is the default and specifying it
is redundant.
2026-01-13 14:13:10 +01:00
yulo ea4a321f2a
HIP: add fattn-mma-f16 for RDNA4 (#18481)
* finish VQ mma

* flash_attn_ext_f16_iter

* KQ_rowsum

* correct exp

* fix scale error

* fix softmax scale

* fix softmax scale

* enable fattn on cpu side

* fix random error

* disable fattn-mma-f16 on rdna3

* fix wrong col for rdna

* use identity mat to transpose

* resolve conflicts

* basic tuning for DeepSeek-R1-Distill-Qwen-1.5B

* fix volta compile error

* align rdna4 policy for fattn

* adjust fattn policy

* adjust kernel selection logic

* update as the review comments

* keep fattn-wmma logic

* adjust kernel selection logic

---------

Co-authored-by: zhang hui <you@example.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-01-13 13:52:16 +01:00
Johannes Gäßler c1e79e610f
doc: ban AI-generated PR descriptions [no ci] (#18765) 2026-01-13 13:43:12 +01:00
Xuan-Son Nguyen e047f9ee9d
mtmd: fix use_non_causal being reported incorrectly (#18793)
* mtmd: fix use_non_causal being reported incorrectly

* move clip_is_mrope to mtmd_decode_use_mrope

* fix sloppy code ggml_cpy
2026-01-13 12:19:38 +01:00
Georgi Gerganov 0a57271ab6
CUDA : fix unused argument when USE_CUDA_GRAPH=OFF (#18800) 2026-01-13 12:25:53 +02:00
Gabe Goodhart 076b0faf7d
graph : clean up t5 input builders (#18795)
* fix: Remove unnecessary `h` loops where `h` was only ever 0

Branch: CleanUpT5InputBuilders

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix: Remove unnecessary padding loop that is never hit anymore

The upper bound used to use GGML_PAD(n_tokens, GGML_KQ_MASK_PAD), but was
removed in https://github.com/ggml-org/llama.cpp/pull/17910 leaving the
loop dead.

Branch: CleanUpT5InputBuilders

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2026-01-13 09:43:51 +01:00
Ruben Ortlam db79dc06b1
llama-bench: add direct_io parameter (#18778) 2026-01-13 08:49:10 +01:00
Adrien Gallouët 537d4240d4
ci : remove libcurl in releases (#18775)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-12 21:43:02 +01:00
Aleksander Grygier 120f3c978c chore: update webui build output 2026-01-12 18:27:54 +01:00
Aleksander Grygier 5407b2efab feat: MCP connection details WIP 2026-01-12 18:26:48 +01:00
Radoslav Gerganov bcf7546160
server : add arg for disabling prompt caching (#18776)
* server : add arg for disabling prompt caching

Disabling prompt caching is useful for clients who are restricted to
sending only OpenAI-compat requests and want deterministic
responses.

* address review comments

* address review comments
2026-01-12 19:21:34 +02:00
Aleksander Grygier 0009c0c300 refactor: MCP types and health check 2026-01-12 18:12:08 +01:00
Adrien Gallouët 36c5913c45
ci : use openssl for openEuler-latest-cmake-cann (#18779)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-12 17:29:00 +01:00
Adrien Gallouët 8e649571cd
vendor : update cpp-httplib to 0.30.1 (#18771)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-12 15:58:52 +01:00