Jeff Bolz
ea13cba850
vulkan: support buffer_from_host_ptr ( #18467 )
...
* vulkan: support buffer_from_host_ptr
* hacky use of buffer_from_host_ptr for directio
* disable buffer_from_host_ptr cap
* use external memory for ggml_vk_host_malloc, revert model loader changes
* disable external_memory_host for MoltenVK
* take buffer memory types into account
* don't use external_memory_host for ggml_vk_host_malloc
2026-01-06 17:37:07 +01:00
Aman Gupta
090b137e56
ggml-cuda: refactor cuda graph usage ( #18637 )
...
* ggml-cuda: refactor cuda graph usage
* use is_enabled() instead of enabled
2026-01-06 23:48:45 +08:00
Beinsezii
968929528c
mmq.cu: tune mmq/rocblas switching for RDNA ( #18537 )
...
* Patch perf regression for mmq kernels in ROCm
recover performance regression for https://github.com/ggml-org/llama.cpp/issues/17917
* add n_experts branch like the cdna path
* mmq.cu: tune mmq/wmma switching for RDNA
* mmq.cu: move amd wmma mmq/wmma switching behind IS_RDNA3
* Update ggml/src/ggml-cuda/mmq.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Jiacheng (Jason) Chen <76919340+jiachengjason@users.noreply.github.com>
Co-authored-by: jiachengjason <jasonchen.jiacheng@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-01-06 16:26:07 +01:00
R
3d26a09dc7
server : add thinking content blocks to Anthropic Messages API ( #18551 )
...
* server : add thinking content blocks to Anthropic Messages API
Add support for returning reasoning/thinking content in Anthropic API
responses when using models with --reasoning-format deepseek and the
thinking parameter enabled.
- Non-streaming: adds thinking block before text in content array
- Streaming: emits thinking_delta events with correct block indices
- Partial streaming: tracks reasoning state across chunks via
anthropic_has_reasoning member variable
Tested with bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF model.
* server : fix Anthropic API streaming for thinking content blocks
Add signature field and fix duplicate content_block_start events in
Anthropic Messages API streaming responses for reasoning models.
* server: refactor Anthropic streaming state to avoid raw pointer
Replace raw pointer to task_result_state with direct field copies:
- Copy state fields in update() before processing chunk
- Use local copies in to_json_anthropic() instead of dereferencing
- Pre-compute state updates for next chunk in update()
This makes the data flow clearer and avoids unsafe pointer patterns.
2026-01-06 16:17:13 +01:00
Christian Kastner
bd2a93d475
gguf-py : add requests to dependencies ( #18629 )
2026-01-06 08:56:38 +01:00
Adrien Gallouët
e75ee11024
ggml : fix avx512bf16 build ( #18623 )
...
- include `immintrin.h` when required
- remove unused m512bh
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-06 08:54:10 +02:00
Raul Torres
da9b8d3300
CANN: Make `valid_values` variable `static const` ( #18627 )
2026-01-06 11:53:28 +08:00
nwyin
e443fbcfa5
ggml webgpu: add CEIL operation support ( #18605 )
...
* ggml-webgpu: add CEIL operation support
Add support for the CEIL unary operation in the WebGPU backend:
- Add CEIL_FUNC shader template in unary_op.wgsl
- Add 4 shader variants (f32, f16, inplace versions)
- Initialize CEIL pipelines in ggml-webgpu.cpp
- Register CEIL in supports_op function
* docs: update WebGPU ops support for CEIL
2026-01-05 11:38:57 -08:00
Tarek Dakhran
73d284a250
model : add LFM2-ColBert-350M ( #18607 )
...
* model : add LFM2-ColBert-350M
* llama_model_n_embd_out() - returns `hparams.n_embd_out` if set and fallbacks to `hparams.n_embd`
2026-01-05 19:52:56 +01:00
Johannes Gäßler
df17a4c94f
CUDA: fix FA FP16 accumulator overflow for Granite ( #18614 )
2026-01-05 19:51:13 +01:00
tt
1871f0ba56
add YoutuVLForConditionalGeneration architectures ( #18620 )
...
* Support Youtu-VL Model
---------
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-05 18:15:14 +01:00
Aman Gupta
f47edb8c19
ggml-cuda: check for srcs outside the cgraph ( #18583 )
...
* ggml-cuda: check for srcs outside the cgraph
* review: use leafs instead
2026-01-05 22:46:36 +08:00
Aleksander Grygier
2d6020b574
feat: Enable adding System Prompt per-chat
2026-01-05 14:30:11 +01:00
Vladislav Sayapin
da143b9940
server : fix router child env in containerized environments ( #18562 )
2026-01-05 14:12:05 +01:00
Aleksander Grygier
469263668f
fix: UI
2026-01-05 11:59:31 +01:00
Aleksander Grygier
cf37390434
chore: update webui build output
2026-01-05 11:57:23 +01:00
Aleksander Grygier
f3734b5b7c
feat: UI improvements
2026-01-05 11:53:53 +01:00
Jeff Bolz
f1768d8f03
vulkan: fix topk_moe_sigmoid_norm_bias failures in GLM-4.6 ( #18582 )
2026-01-05 11:51:39 +01:00
Georgi Gerganov
2da64a2f8a
models : fix backend assignment for Granite/Nemotron graphs ( #18599 )
...
* models : fix backend assignment for Granite/Nemotron graphs
* cont : add ref
* cont : move call to build_inp_embd()
2026-01-05 12:34:23 +02:00
Jeff Bolz
b37124d2d2
vulkan: handle quantize_q8_1 overflowing the max workgroup count ( #18515 )
...
* vulkan: handle quantize_q8_1 overflowing the max workgroup count
* vulkan: Fix small tile size matmul on lavapipe
* fix mul_mat_id failures
2026-01-05 11:30:14 +01:00
Sigbjørn Skjæret
eadc4184ca
llama : refactor rope_freq_base/scale_swa conversion and init ( #18553 )
...
* refactor rope_freq_base/scale_swa conversion and init
* safe defaults for unknowns
* update relevant models
* grammar
* add get_rope_freq_scale to modern-bert
* const
* const
* log swa info
2026-01-05 09:14:04 +01:00
Pascal
653f85fedd
webui: raw tool result display, strip only leading/trailing newlines to preserve indentation
2026-01-05 09:01:31 +01:00
Pascal
fc7218ae11
webui: split raw output into backend parsing and frontend display options
2026-01-05 09:01:31 +01:00
Pascal
4f9d9d41b9
webui: remove legacy wrapper and restore WebSocket transport
2026-01-05 09:01:31 +01:00
Pascal
183d9eebff
webui: remove unused imports
2026-01-05 09:01:31 +01:00
Aleksander Grygier
f7ea69fa18
chore: update webui build output
2026-01-05 09:01:31 +01:00
Aleksander Grygier
c5d01fbb8f
feat: Improve agentic tool call streaming display with 'in progress' state
2026-01-05 09:01:31 +01:00
Aleksander Grygier
f755673c6f
feat: Enhance MCP server dropdown with search, popularity sorting, and per-chat overrides
2026-01-05 09:01:31 +01:00
Aleksander Grygier
81ad2d5569
feat: Add per-chat MCP server overrides
2026-01-05 09:01:31 +01:00
Aleksander Grygier
865c28a96d
chore: update webui build output
2026-01-05 09:01:31 +01:00
Aleksander Grygier
2592471d11
feat: Add image load error fallback in MarkdownContent
2026-01-05 09:01:31 +01:00
Aleksander Grygier
069be7b517
feat: Implement lazy MCP client shutdown
2026-01-05 09:01:31 +01:00
Aleksander Grygier
9571e07687
feat: Enhance tool call streaming UI and output format
2026-01-05 09:01:31 +01:00
Aleksander Grygier
260375819d
feat: Display and manage servers in ChatForm actions
2026-01-05 09:01:31 +01:00
Aleksander Grygier
74345d8785
feat: Integrate server management dialog into chat settings
2026-01-05 09:01:31 +01:00
Aleksander Grygier
dde5e1582c
feat: Implement dedicated server management UI components
2026-01-05 09:01:31 +01:00
Aleksander Grygier
c24d5e36f0
refactor: Centralize health check logic in store
2026-01-05 09:01:31 +01:00
Aleksander Grygier
f87b10ee66
feat: Enhance server config with headers and schema normalization
2026-01-05 09:01:31 +01:00
Aleksander Grygier
778ad550b1
feat: Add McpLogo Svelte component
2026-01-05 09:01:31 +01:00
Aleksander Grygier
c1c2234a62
refactor: Consolidate UI CSS classes into shared module
2026-01-05 09:01:31 +01:00
Aleksander Grygier
883d2a4f15
chore: update webui build output
2026-01-05 09:01:31 +01:00
Aleksander Grygier
7d5fd37324
feat: Raw LLM output switch per message
2026-01-05 09:01:31 +01:00
Aleksander Grygier
03464a0780
refactor: Tool call handling
2026-01-05 09:01:31 +01:00
Aleksander Grygier
3e7318f09d
docs: Update high-level architecture diagrams for MCP integration
2026-01-05 09:01:15 +01:00
Aleksander Grygier
219be7807e
feat: Add AgenticContent component for enhanced tool call rendering
2026-01-05 09:01:15 +01:00
Aleksander Grygier
52b1a1bffa
refactor: Update ChatStore to leverage mcpStore for agentic flow
2026-01-05 09:01:15 +01:00
Aleksander Grygier
60475dca3c
feat: Implement agentic orchestration within ChatService
2026-01-05 09:01:15 +01:00
Aleksander Grygier
5f5d5ab45f
feat: Introduce reactive mcpStore for client lifecycle management
2026-01-05 09:01:15 +01:00
Aleksander Grygier
9ab2326e79
feat: Refactor MCP client to use official SDK
2026-01-05 09:01:15 +01:00
Aleksander Grygier
4dbcb5cdfd
feat: Add @modelcontextprotocol/sdk and zod dependencies
2026-01-05 09:01:15 +01:00