Commit Graph

8035 Commits

Author SHA1 Message Date
Jeff Bolz ea13cba850
vulkan: support buffer_from_host_ptr (#18467)
* vulkan: support buffer_from_host_ptr

* hacky use of buffer_from_host_ptr for directio

* disable buffer_from_host_ptr cap

* use external memory for ggml_vk_host_malloc, revert model loader changes

* disable external_memory_host for MoltenVK

* take buffer memory types into account

* don't use external_memory_host for ggml_vk_host_malloc
2026-01-06 17:37:07 +01:00
Aman Gupta 090b137e56
ggml-cuda: refactor cuda graph usage (#18637)
* ggml-cuda: refactor cuda graph usage

* use is_enabled() instead of enabled
2026-01-06 23:48:45 +08:00
Beinsezii 968929528c
mmq.cu: tune mmq/rocblas switching for RDNA (#18537)
* Patch perf regression for mmq kernels in ROCm

recover performance regression for https://github.com/ggml-org/llama.cpp/issues/17917

* add n_experts branch like the cdna path

* mmq.cu: tune mmq/wmma switching for RDNA

* mmq.cu: move amd wmma mmq/wmma switching behind IS_RDNA3

* Update ggml/src/ggml-cuda/mmq.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Jiacheng (Jason) Chen <76919340+jiachengjason@users.noreply.github.com>
Co-authored-by: jiachengjason <jasonchen.jiacheng@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-01-06 16:26:07 +01:00
R 3d26a09dc7
server : add thinking content blocks to Anthropic Messages API (#18551)
* server : add thinking content blocks to Anthropic Messages API

Add support for returning reasoning/thinking content in Anthropic API
responses when using models with --reasoning-format deepseek and the
thinking parameter enabled.

- Non-streaming: adds thinking block before text in content array
- Streaming: emits thinking_delta events with correct block indices
- Partial streaming: tracks reasoning state across chunks via
  anthropic_has_reasoning member variable

Tested with bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF model.

* server : fix Anthropic API streaming for thinking content blocks

Add signature field and fix duplicate content_block_start events in
Anthropic Messages API streaming responses for reasoning models.

* server: refactor Anthropic streaming state to avoid raw pointer

Replace raw pointer to task_result_state with direct field copies:
- Copy state fields in update() before processing chunk
- Use local copies in to_json_anthropic() instead of dereferencing
- Pre-compute state updates for next chunk in update()

This makes the data flow clearer and avoids unsafe pointer patterns.
2026-01-06 16:17:13 +01:00
Christian Kastner bd2a93d475
gguf-py : add requests to dependencies (#18629) 2026-01-06 08:56:38 +01:00
Adrien Gallouët e75ee11024
ggml : fix avx512bf16 build (#18623)
- include `immintrin.h` when required
- remove unused m512bh

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-01-06 08:54:10 +02:00
Raul Torres da9b8d3300
CANN: Make `valid_values` variable `static const` (#18627) 2026-01-06 11:53:28 +08:00
nwyin e443fbcfa5
ggml webgpu: add CEIL operation support (#18605)
* ggml-webgpu: add CEIL operation support

      Add support for the CEIL unary operation in the WebGPU backend:
      - Add CEIL_FUNC shader template in unary_op.wgsl
      - Add 4 shader variants (f32, f16, inplace versions)
      - Initialize CEIL pipelines in ggml-webgpu.cpp
      - Register CEIL in supports_op function

* docs: update WebGPU ops support for CEIL
2026-01-05 11:38:57 -08:00
Tarek Dakhran 73d284a250
model : add LFM2-ColBert-350M (#18607)
* model : add LFM2-ColBert-350M

* llama_model_n_embd_out() - returns `hparams.n_embd_out` if set and fallbacks to `hparams.n_embd`
2026-01-05 19:52:56 +01:00
Johannes Gäßler df17a4c94f
CUDA: fix FA FP16 accumulator overflow for Granite (#18614) 2026-01-05 19:51:13 +01:00
tt 1871f0ba56
add YoutuVLForConditionalGeneration architectures (#18620)
* Support Youtu-VL Model
---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-05 18:15:14 +01:00
Aman Gupta f47edb8c19
ggml-cuda: check for srcs outside the cgraph (#18583)
* ggml-cuda: check for srcs outside the cgraph

* review: use leafs instead
2026-01-05 22:46:36 +08:00
Aleksander Grygier 2d6020b574 feat: Enable adding System Prompt per-chat 2026-01-05 14:30:11 +01:00
Vladislav Sayapin da143b9940
server : fix router child env in containerized environments (#18562) 2026-01-05 14:12:05 +01:00
Aleksander Grygier 469263668f fix: UI 2026-01-05 11:59:31 +01:00
Aleksander Grygier cf37390434 chore: update webui build output 2026-01-05 11:57:23 +01:00
Aleksander Grygier f3734b5b7c feat: UI improvements 2026-01-05 11:53:53 +01:00
Jeff Bolz f1768d8f03
vulkan: fix topk_moe_sigmoid_norm_bias failures in GLM-4.6 (#18582) 2026-01-05 11:51:39 +01:00
Georgi Gerganov 2da64a2f8a
models : fix backend assignment for Granite/Nemotron graphs (#18599)
* models : fix backend assignment for Granite/Nemotron graphs

* cont : add ref

* cont : move call to build_inp_embd()
2026-01-05 12:34:23 +02:00
Jeff Bolz b37124d2d2
vulkan: handle quantize_q8_1 overflowing the max workgroup count (#18515)
* vulkan: handle quantize_q8_1 overflowing the max workgroup count

* vulkan: Fix small tile size matmul on lavapipe

* fix mul_mat_id failures
2026-01-05 11:30:14 +01:00
Sigbjørn Skjæret eadc4184ca
llama : refactor rope_freq_base/scale_swa conversion and init (#18553)
* refactor rope_freq_base/scale_swa conversion and init

* safe defaults for unknowns

* update relevant models

* grammar

* add get_rope_freq_scale to modern-bert

* const

* const

* log swa info
2026-01-05 09:14:04 +01:00
Pascal 653f85fedd webui: raw tool result display, strip only leading/trailing newlines to preserve indentation 2026-01-05 09:01:31 +01:00
Pascal fc7218ae11 webui: split raw output into backend parsing and frontend display options 2026-01-05 09:01:31 +01:00
Pascal 4f9d9d41b9 webui: remove legacy wrapper and restore WebSocket transport 2026-01-05 09:01:31 +01:00
Pascal 183d9eebff webui: remove unused imports 2026-01-05 09:01:31 +01:00
Aleksander Grygier f7ea69fa18 chore: update webui build output 2026-01-05 09:01:31 +01:00
Aleksander Grygier c5d01fbb8f feat: Improve agentic tool call streaming display with 'in progress' state 2026-01-05 09:01:31 +01:00
Aleksander Grygier f755673c6f feat: Enhance MCP server dropdown with search, popularity sorting, and per-chat overrides 2026-01-05 09:01:31 +01:00
Aleksander Grygier 81ad2d5569 feat: Add per-chat MCP server overrides 2026-01-05 09:01:31 +01:00
Aleksander Grygier 865c28a96d chore: update webui build output 2026-01-05 09:01:31 +01:00
Aleksander Grygier 2592471d11 feat: Add image load error fallback in MarkdownContent 2026-01-05 09:01:31 +01:00
Aleksander Grygier 069be7b517 feat: Implement lazy MCP client shutdown 2026-01-05 09:01:31 +01:00
Aleksander Grygier 9571e07687 feat: Enhance tool call streaming UI and output format 2026-01-05 09:01:31 +01:00
Aleksander Grygier 260375819d feat: Display and manage servers in ChatForm actions 2026-01-05 09:01:31 +01:00
Aleksander Grygier 74345d8785 feat: Integrate server management dialog into chat settings 2026-01-05 09:01:31 +01:00
Aleksander Grygier dde5e1582c feat: Implement dedicated server management UI components 2026-01-05 09:01:31 +01:00
Aleksander Grygier c24d5e36f0 refactor: Centralize health check logic in store 2026-01-05 09:01:31 +01:00
Aleksander Grygier f87b10ee66 feat: Enhance server config with headers and schema normalization 2026-01-05 09:01:31 +01:00
Aleksander Grygier 778ad550b1 feat: Add McpLogo Svelte component 2026-01-05 09:01:31 +01:00
Aleksander Grygier c1c2234a62 refactor: Consolidate UI CSS classes into shared module 2026-01-05 09:01:31 +01:00
Aleksander Grygier 883d2a4f15 chore: update webui build output 2026-01-05 09:01:31 +01:00
Aleksander Grygier 7d5fd37324 feat: Raw LLM output switch per message 2026-01-05 09:01:31 +01:00
Aleksander Grygier 03464a0780 refactor: Tool call handling 2026-01-05 09:01:31 +01:00
Aleksander Grygier 3e7318f09d docs: Update high-level architecture diagrams for MCP integration 2026-01-05 09:01:15 +01:00
Aleksander Grygier 219be7807e feat: Add AgenticContent component for enhanced tool call rendering 2026-01-05 09:01:15 +01:00
Aleksander Grygier 52b1a1bffa refactor: Update ChatStore to leverage mcpStore for agentic flow 2026-01-05 09:01:15 +01:00
Aleksander Grygier 60475dca3c feat: Implement agentic orchestration within ChatService 2026-01-05 09:01:15 +01:00
Aleksander Grygier 5f5d5ab45f feat: Introduce reactive mcpStore for client lifecycle management 2026-01-05 09:01:15 +01:00
Aleksander Grygier 9ab2326e79 feat: Refactor MCP client to use official SDK 2026-01-05 09:01:15 +01:00
Aleksander Grygier 4dbcb5cdfd feat: Add @modelcontextprotocol/sdk and zod dependencies 2026-01-05 09:01:15 +01:00