Pascal
5c28b7a2ee
chore: update webui build output
2026-01-17 18:38:50 +01:00
Pascal
fca7177eae
fix: ignore assistant attachments (MCP) for modality detection
2026-01-17 18:36:41 +01:00
Pascal
3572667788
chore: update webui build output
2026-01-17 16:35:54 +01:00
Pascal
506da17931
refactor: eliminate MCP circular dependency
...
- Change architecture from mcpStore <-> mcpClient to mcpClient -> mcpStore
- Remove bidirectional callback pattern (set*Callback, notify* methods)
- Add updateState/updateHealthCheck public methods in mcpStore
- Replace callback calls with direct mcpStore method calls
- Remove unused imports (browser, HealthCheckState) and constructor
- Fixes CI: ReferenceError Cannot access mcpClient before initialization
2026-01-17 16:30:42 +01:00
Pascal
9b3417703f
fix: remove obsolete modality UI tests causing CI failures
...
- Remove VisionModality/AudioModality test stories
- Remove mockServerProps usage and imports
- Simplify Default test (remove dropdown interaction checks)
- Simplify FileAttachments test (remove mocks)
2026-01-17 16:30:36 +01:00
Pascal
a723238245
chore: update webui build output
2026-01-16 19:52:23 +01:00
Pascal
229aba7c3e
fix: strip reasoning content and UI proprietary tags from prompts
...
TODO: add toggle and ensure backend API compliance for reasoning format
2026-01-16 19:50:36 +01:00
Pascal
f09395821b
chore: update webui build output
2026-01-16 15:22:46 +01:00
Pascal
78c6380222
refactor: remove reasoning after first turn filter
2026-01-16 15:19:50 +01:00
Pascal
2973c64609
refactor: inline reasoning with tags, remove fixed thinking field
2026-01-16 15:19:42 +01:00
Pascal
a1550ab77d
chore: update webui build output
2026-01-16 11:02:17 +01:00
Pascal
db37b712b2
feat: resolve MCP attachment images via rehype plugin
...
LLM can reference tool-generated images using markdown links like,
plugin resolves attachment names to base64 from message.extra when present,
regular HTTP/data URLs pass through unchanged (no regression)
- rehypeResolveAttachmentImages plugin in markdown pipeline
- Pass message prop to MarkdownContent and AgenticContent
- Force processor reactivity on message.extra changes
- Filter assistant images from API context (display-only)
2026-01-16 10:49:28 +01:00
Pascal
a3c2144c1d
feat: persist base64 attachments from tool results
2026-01-16 08:07:20 +01:00
Pascal
a377605f60
webui: fix custom headers persistence in UI (derived)
2026-01-15 20:36:14 +01:00
Pascal
3360f60b94
webui: fix custom headers persistence in UI
2026-01-15 20:13:01 +01:00
Aleksander Grygier
cffc3b46ae
fix: Word wrapping
2026-01-15 17:59:57 +01:00
Aleksander Grygier
5417a439ef
chore: update webui build output
2026-01-15 11:39:10 +01:00
Aleksander Grygier
30a585bb96
feat: UI improvements
2026-01-14 17:32:57 +01:00
Aleksander Grygier
886939c550
chore: update webui build output
2026-01-14 14:39:32 +01:00
Aleksander Grygier
39848ee12f
feat: UI improvement
2026-01-14 14:26:41 +01:00
Aleksander Grygier
c1ac8d7326
chore: update webui build output
2026-01-14 13:22:01 +01:00
Aleksander Grygier
afdae742e3
Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp
2026-01-14 13:20:25 +01:00
Aleksander Grygier
b11b32ea28
chore: update webui build output
2026-01-14 12:47:13 +01:00
Aleksander Grygier
06efeb6eb9
chore: update webui build output
2026-01-14 11:49:26 +01:00
Aleksander Grygier
f89bcb90ca
feat: MCP Server Details
2026-01-14 11:45:47 +01:00
Jeff Bolz
3e4bb29666
vulkan: Check maxStorageBufferRange in supports_op ( #18709 )
...
* vulkan: Check maxStorageBufferRange in supports_op
* skip maxStorageBufferRange check when shader64BitIndexing is enabled
2026-01-14 10:59:05 +01:00
Aman Gupta
47f9612492
llama-model: fix unfortunate typo ( #18832 )
2026-01-14 17:55:15 +08:00
Daniel Bevenius
01cbdfd7eb
CUDA : fix typo in clang pragma comment [no ci] ( #18830 )
2026-01-14 10:31:49 +01:00
Ruben Ortlam
635ef78ec5
vulkan: work around Intel fp16 bug in mmq ( #18814 )
2026-01-14 09:41:23 +01:00
Perry Naseck
7d587e5544
ggml-metal: do not copy headers for embedded, use current binary dir for embedded ( #18705 )
2026-01-14 09:22:25 +02:00
Daniel Benjaminsson
d34aa07193
mmap: add Haiku support by skipping RLIMIT_MEMLOCK check ( #18819 )
...
Haiku OS does not support RLIMIT_MEMLOCK, similar to visionOS/tvOS.
Skip the resource limit check on Haiku to allow mlock functionality
to work without compile errors.
Tested on Haiku with NVIDIA RTX 3080 Ti using Vulkan backend.
2026-01-14 09:11:05 +02:00
Adrien Gallouët
f709c7a33f
ci, tests : use cmake to download models and remove libcurl dependency ( #18791 )
...
* ci, tests : use cmake to download models and remove libcurl dependency
* llama_dl_model -> llama_download_model
* use EXPECTED_HASH for robust model downloading
* Move llama_download_model to cmake/common.cmake
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-14 07:46:27 +01:00
ddh0
6e36299b47
llama : print_info alignment fix ( #18708 )
...
* fix text spacing in print_info
* align all
2026-01-14 00:05:11 +01:00
Junwon Hwang
60591f01d4
model : add EXAONE MoE ( #18543 )
...
* Add EXAONE MoE implementations
Co-authored-by: Junwon Hwang <nuclear1221@gmail.com>
* Address PR feedback
* Address PR feedback
* [WIP] Add MTP for EXAONE-MoE
* Address PR feedback
* Address PR feedback
* Address PR feedback
* Address PR feedback
* Address PR feedback
* Address PR feedback
* Address PR feedback
---------
Co-authored-by: LG-AI-EXAONE <exaonemodels@lgresearch.ai>
2026-01-13 23:28:38 +01:00
Georgi Gerganov
e4832e3ae4
vocab : fix attribute overrides for harmony ( #18806 )
...
* vocab : fix attribute overrides for harmony
* cont : add warning log
2026-01-13 17:40:13 +02:00
Ruben Ortlam
960e5e3b46
llama-mmap: fix direct-io loading fallback EOF exception ( #18801 )
2026-01-13 15:57:07 +01:00
Daniel Bevenius
20ca2e12c4
model-conversion : remove -c 0 from model card template [no ci] ( #18807 )
...
This commit removes the `-c, --ctx-size N` from the llama-server
command in the model card template for causal models.
The motivation for this is that -c 0 is the default and specifying it
is redundant.
2026-01-13 14:13:10 +01:00
yulo
ea4a321f2a
HIP: add fattn-mma-f16 for RDNA4 ( #18481 )
...
* finish VQ mma
* flash_attn_ext_f16_iter
* KQ_rowsum
* correct exp
* fix scale error
* fix softmax scale
* fix softmax scale
* enable fattn on cpu side
* fix random error
* disable fattn-mma-f16 on rdna3
* fix wrong col for rdna
* use identity mat to transpose
* resolve conflicts
* basic tuning for DeepSeek-R1-Distill-Qwen-1.5B
* fix volta compile error
* align rdna4 policy for fattn
* adjust fattn policy
* adjust kernel selection logic
* update as the review comments
* keep fattn-wmma logic
* adjust kernel selection logic
---------
Co-authored-by: zhang hui <you@example.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-01-13 13:52:16 +01:00
Johannes Gäßler
c1e79e610f
doc: ban AI-generated PR descriptions [no ci] ( #18765 )
2026-01-13 13:43:12 +01:00
Xuan-Son Nguyen
e047f9ee9d
mtmd: fix use_non_causal being reported incorrectly ( #18793 )
...
* mtmd: fix use_non_causal being reported incorrectly
* move clip_is_mrope to mtmd_decode_use_mrope
* fix sloppy code ggml_cpy
2026-01-13 12:19:38 +01:00
Georgi Gerganov
0a57271ab6
CUDA : fix unused argument when USE_CUDA_GRAPH=OFF ( #18800 )
2026-01-13 12:25:53 +02:00
Gabe Goodhart
076b0faf7d
graph : clean up t5 input builders ( #18795 )
...
* fix: Remove unnecessary `h` loops where `h` was only ever 0
Branch: CleanUpT5InputBuilders
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Remove unnecessary padding loop that is never hit anymore
The upper bound used to use GGML_PAD(n_tokens, GGML_KQ_MASK_PAD), but was
removed in https://github.com/ggml-org/llama.cpp/pull/17910 leaving the
loop dead.
Branch: CleanUpT5InputBuilders
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2026-01-13 09:43:51 +01:00
Ruben Ortlam
db79dc06b1
llama-bench: add direct_io parameter ( #18778 )
2026-01-13 08:49:10 +01:00
Adrien Gallouët
537d4240d4
ci : remove libcurl in releases ( #18775 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-12 21:43:02 +01:00
Aleksander Grygier
120f3c978c
chore: update webui build output
2026-01-12 18:27:54 +01:00
Aleksander Grygier
5407b2efab
feat: MCP connection details WIP
2026-01-12 18:26:48 +01:00
Radoslav Gerganov
bcf7546160
server : add arg for disabling prompt caching ( #18776 )
...
* server : add arg for disabling prompt caching
Disabling prompt caching is useful for clients who are restricted to
sending only OpenAI-compat requests and want deterministic
responses.
* address review comments
* address review comments
2026-01-12 19:21:34 +02:00
Aleksander Grygier
0009c0c300
refactor: MCP types and health check
2026-01-12 18:12:08 +01:00
Adrien Gallouët
36c5913c45
ci : use openssl for openEuler-latest-cmake-cann ( #18779 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-12 17:29:00 +01:00
Adrien Gallouët
8e649571cd
vendor : update cpp-httplib to 0.30.1 ( #18771 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-12 15:58:52 +01:00