Aleksander Grygier
62ed7f112d
chore: update webui build output
2026-01-19 16:26:16 +01:00
Aleksander Grygier
d37683942b
fix: Missing onModelChange callback running assistant message re-generation
2026-01-19 16:25:49 +01:00
Pascal
d6dfe8e064
chore: update webui build output
2026-01-19 12:12:52 +01:00
Pascal
058929d453
fix: acurate tool_response display
2026-01-19 12:11:06 +01:00
Pascal
d92b621346
fix: unify MCP server label logic with simplified fallback
2026-01-18 13:10:03 +01:00
Pascal
16a03eea36
chore: update webui build output
2026-01-18 10:43:45 +01:00
Pascal
d8af98f1ed
refactor: remove multimodal validation from model selector
...
Remove all frontend validation logic that prevented users from selecting
models based on multimodal capabilities. This refactoring removes
restrictive UI code while maintaining full functionality
- Vision models can describe images as text
- That text remains useful for non-vision models
- Chaining vision -> non-vision is a valid workflow
- Users know their use case better than the UI
- Users can return to vision models when needed
2026-01-18 10:42:01 +01:00
Pascal
5c28b7a2ee
chore: update webui build output
2026-01-17 18:38:50 +01:00
Pascal
fca7177eae
fix: ignore assistant attachments (MCP) for modality detection
2026-01-17 18:36:41 +01:00
Pascal
3572667788
chore: update webui build output
2026-01-17 16:35:54 +01:00
Pascal
506da17931
refactor: eliminate MCP circular dependency
...
- Change architecture from mcpStore <-> mcpClient to mcpClient -> mcpStore
- Remove bidirectional callback pattern (set*Callback, notify* methods)
- Add updateState/updateHealthCheck public methods in mcpStore
- Replace callback calls with direct mcpStore method calls
- Remove unused imports (browser, HealthCheckState) and constructor
- Fixes CI: ReferenceError Cannot access mcpClient before initialization
2026-01-17 16:30:42 +01:00
Pascal
9b3417703f
fix: remove obsolete modality UI tests causing CI failures
...
- Remove VisionModality/AudioModality test stories
- Remove mockServerProps usage and imports
- Simplify Default test (remove dropdown interaction checks)
- Simplify FileAttachments test (remove mocks)
2026-01-17 16:30:36 +01:00
Pascal
a723238245
chore: update webui build output
2026-01-16 19:52:23 +01:00
Pascal
229aba7c3e
fix: strip reasoning content and UI proprietary tags from prompts
...
TODO: add toggle and ensure backend API compliance for reasoning format
2026-01-16 19:50:36 +01:00
Pascal
f09395821b
chore: update webui build output
2026-01-16 15:22:46 +01:00
Pascal
78c6380222
refactor: remove reasoning after first turn filter
2026-01-16 15:19:50 +01:00
Pascal
2973c64609
refactor: inline reasoning with tags, remove fixed thinking field
2026-01-16 15:19:42 +01:00
Pascal
a1550ab77d
chore: update webui build output
2026-01-16 11:02:17 +01:00
Pascal
db37b712b2
feat: resolve MCP attachment images via rehype plugin
...
LLM can reference tool-generated images using markdown links like,
plugin resolves attachment names to base64 from message.extra when present,
regular HTTP/data URLs pass through unchanged (no regression)
- rehypeResolveAttachmentImages plugin in markdown pipeline
- Pass message prop to MarkdownContent and AgenticContent
- Force processor reactivity on message.extra changes
- Filter assistant images from API context (display-only)
2026-01-16 10:49:28 +01:00
Pascal
a3c2144c1d
feat: persist base64 attachments from tool results
2026-01-16 08:07:20 +01:00
Pascal
a377605f60
webui: fix custom headers persistence in UI (derived)
2026-01-15 20:36:14 +01:00
Pascal
3360f60b94
webui: fix custom headers persistence in UI
2026-01-15 20:13:01 +01:00
Aleksander Grygier
cffc3b46ae
fix: Word wrapping
2026-01-15 17:59:57 +01:00
Aleksander Grygier
5417a439ef
chore: update webui build output
2026-01-15 11:39:10 +01:00
Aleksander Grygier
30a585bb96
feat: UI improvements
2026-01-14 17:32:57 +01:00
Aleksander Grygier
886939c550
chore: update webui build output
2026-01-14 14:39:32 +01:00
Aleksander Grygier
39848ee12f
feat: UI improvement
2026-01-14 14:26:41 +01:00
Aleksander Grygier
c1ac8d7326
chore: update webui build output
2026-01-14 13:22:01 +01:00
Aleksander Grygier
afdae742e3
Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp
2026-01-14 13:20:25 +01:00
Aleksander Grygier
b11b32ea28
chore: update webui build output
2026-01-14 12:47:13 +01:00
Aleksander Grygier
06efeb6eb9
chore: update webui build output
2026-01-14 11:49:26 +01:00
Aleksander Grygier
f89bcb90ca
feat: MCP Server Details
2026-01-14 11:45:47 +01:00
Jeff Bolz
3e4bb29666
vulkan: Check maxStorageBufferRange in supports_op ( #18709 )
...
* vulkan: Check maxStorageBufferRange in supports_op
* skip maxStorageBufferRange check when shader64BitIndexing is enabled
2026-01-14 10:59:05 +01:00
Aman Gupta
47f9612492
llama-model: fix unfortunate typo ( #18832 )
2026-01-14 17:55:15 +08:00
Daniel Bevenius
01cbdfd7eb
CUDA : fix typo in clang pragma comment [no ci] ( #18830 )
2026-01-14 10:31:49 +01:00
Ruben Ortlam
635ef78ec5
vulkan: work around Intel fp16 bug in mmq ( #18814 )
2026-01-14 09:41:23 +01:00
Perry Naseck
7d587e5544
ggml-metal: do not copy headers for embedded, use current binary dir for embedded ( #18705 )
2026-01-14 09:22:25 +02:00
Daniel Benjaminsson
d34aa07193
mmap: add Haiku support by skipping RLIMIT_MEMLOCK check ( #18819 )
...
Haiku OS does not support RLIMIT_MEMLOCK, similar to visionOS/tvOS.
Skip the resource limit check on Haiku to allow mlock functionality
to work without compile errors.
Tested on Haiku with NVIDIA RTX 3080 Ti using Vulkan backend.
2026-01-14 09:11:05 +02:00
Adrien Gallouët
f709c7a33f
ci, tests : use cmake to download models and remove libcurl dependency ( #18791 )
...
* ci, tests : use cmake to download models and remove libcurl dependency
* llama_dl_model -> llama_download_model
* use EXPECTED_HASH for robust model downloading
* Move llama_download_model to cmake/common.cmake
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-14 07:46:27 +01:00
ddh0
6e36299b47
llama : print_info alignment fix ( #18708 )
...
* fix text spacing in print_info
* align all
2026-01-14 00:05:11 +01:00
Junwon Hwang
60591f01d4
model : add EXAONE MoE ( #18543 )
...
* Add EXAONE MoE implementations
Co-authored-by: Junwon Hwang <nuclear1221@gmail.com>
* Address PR feedback
* Address PR feedback
* [WIP] Add MTP for EXAONE-MoE
* Address PR feedback
* Address PR feedback
* Address PR feedback
* Address PR feedback
* Address PR feedback
* Address PR feedback
* Address PR feedback
---------
Co-authored-by: LG-AI-EXAONE <exaonemodels@lgresearch.ai>
2026-01-13 23:28:38 +01:00
Georgi Gerganov
e4832e3ae4
vocab : fix attribute overrides for harmony ( #18806 )
...
* vocab : fix attribute overrides for harmony
* cont : add warning log
2026-01-13 17:40:13 +02:00
Ruben Ortlam
960e5e3b46
llama-mmap: fix direct-io loading fallback EOF exception ( #18801 )
2026-01-13 15:57:07 +01:00
Daniel Bevenius
20ca2e12c4
model-conversion : remove -c 0 from model card template [no ci] ( #18807 )
...
This commit removes the `-c, --ctx-size N` from the llama-server
command in the model card template for causal models.
The motivation for this is that -c 0 is the default and specifying it
is redundant.
2026-01-13 14:13:10 +01:00
yulo
ea4a321f2a
HIP: add fattn-mma-f16 for RDNA4 ( #18481 )
...
* finish VQ mma
* flash_attn_ext_f16_iter
* KQ_rowsum
* correct exp
* fix scale error
* fix softmax scale
* fix softmax scale
* enable fattn on cpu side
* fix random error
* disable fattn-mma-f16 on rdna3
* fix wrong col for rdna
* use identity mat to transpose
* resolve conflicts
* basic tuning for DeepSeek-R1-Distill-Qwen-1.5B
* fix volta compile error
* align rdna4 policy for fattn
* adjust fattn policy
* adjust kernel selection logic
* update as the review comments
* keep fattn-wmma logic
* adjust kernel selection logic
---------
Co-authored-by: zhang hui <you@example.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-01-13 13:52:16 +01:00
Johannes Gäßler
c1e79e610f
doc: ban AI-generated PR descriptions [no ci] ( #18765 )
2026-01-13 13:43:12 +01:00
Xuan-Son Nguyen
e047f9ee9d
mtmd: fix use_non_causal being reported incorrectly ( #18793 )
...
* mtmd: fix use_non_causal being reported incorrectly
* move clip_is_mrope to mtmd_decode_use_mrope
* fix sloppy code ggml_cpy
2026-01-13 12:19:38 +01:00
Georgi Gerganov
0a57271ab6
CUDA : fix unused argument when USE_CUDA_GRAPH=OFF ( #18800 )
2026-01-13 12:25:53 +02:00
Gabe Goodhart
076b0faf7d
graph : clean up t5 input builders ( #18795 )
...
* fix: Remove unnecessary `h` loops where `h` was only ever 0
Branch: CleanUpT5InputBuilders
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
* fix: Remove unnecessary padding loop that is never hit anymore
The upper bound used to use GGML_PAD(n_tokens, GGML_KQ_MASK_PAD), but was
removed in https://github.com/ggml-org/llama.cpp/pull/17910 leaving the
loop dead.
Branch: CleanUpT5InputBuilders
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
---------
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2026-01-13 09:43:51 +01:00
Ruben Ortlam
db79dc06b1
llama-bench: add direct_io parameter ( #18778 )
2026-01-13 08:49:10 +01:00