Commit Graph

7978 Commits

Author SHA1 Message Date
Aleksander Grygier ff0e927be2 chore: update webui build output 2026-01-25 13:38:25 +01:00
Aleksander Grygier ee9efae203 refactor: Enums 2026-01-25 13:37:08 +01:00
Aleksander Grygier 7f5284d597 refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
2026-01-25 13:13:11 +01:00
Aleksander Grygier 97642211a9 chore: update webui build output 2026-01-25 02:10:25 +01:00
Aleksander Grygier fc377123b7 refactor: Simplify MCP errors 2026-01-25 02:09:12 +01:00
Aleksander Grygier 202262c2dc chore: update webui build output 2026-01-25 01:44:14 +01:00
Aleksander Grygier b58b823b57 refactor: Types 2026-01-25 01:39:49 +01:00
Aleksander Grygier ba39f8cc7b chore: update webui build output 2026-01-25 01:21:34 +01:00
Aleksander Grygier 9bcfdc3483 refactor: DRY 2026-01-25 01:17:59 +01:00
Aleksander Grygier e7ff091881
chore: Add deprecation comment 2026-01-25 01:05:28 +01:00
Aleksander Grygier 1c843b2863 chore: update webui build output 2026-01-25 01:04:34 +01:00
Aleksander Grygier 5dfc520d67 refactor: Cleanup 2026-01-25 00:48:21 +01:00
Aleksander Grygier 6daa39994c refactor: Naming & Enums 2026-01-25 00:32:37 +01:00
Aleksander Grygier 2562dc50bd chore: update webui build output 2026-01-25 00:32:16 +01:00
Aleksander Grygier 372202632e refactor: Cleanup 2026-01-25 00:31:49 +01:00
Aleksander Grygier ba230c5cce refactor: Naming + remove redundant component 2026-01-24 23:58:17 +01:00
Aleksander Grygier f7b5f62586 refactor: Remove unused code 2026-01-24 23:45:06 +01:00
Aleksander Grygier 22d9e645aa chore: update webui build output 2026-01-24 23:39:04 +01:00
Aleksander Grygier d938994395 refactor: Cleanup 2026-01-24 23:38:37 +01:00
Aleksander Grygier fc4c392dce chore: update webui build output 2026-01-24 20:54:24 +01:00
Aleksander Grygier 79e606eb99 refactor: Constants 2026-01-24 20:52:19 +01:00
Aleksander Grygier 3d7426cdd4 refactor: Cleanup 2026-01-24 20:47:32 +01:00
Aleksander Grygier 8bf2d38da1 chore: update webui build output 2026-01-24 20:32:53 +01:00
Aleksander Grygier 14911e51fc feat: MCP Prompts implementation improvements 2026-01-24 20:30:52 +01:00
Aleksander Grygier 801ef93522 refactor: Message Height CSS Variable 2026-01-24 19:15:38 +01:00
Aleksander Grygier 13f756421c refactor: Enums 2026-01-24 18:37:43 +01:00
Pascal 85b8da45f9 fix: resolve TypeScript error in tool response content 2026-01-24 18:04:01 +01:00
Pascal 9ddc54b668 webui: enable vision in agentic tool responses
- Include images from all message roles (not just user)
- Add multipart content support for tool responses
- Images from MCP tools now accessible in same agentic turn
2026-01-24 17:58:20 +01:00
Aleksander Grygier 172e93d494 Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp 2026-01-24 15:13:58 +01:00
Aleksander Grygier da9c245838 chore: update webui build output 2026-01-24 13:59:52 +01:00
Aleksander Grygier 7c4bedda87 feat: Improve formatting performance time 2026-01-24 13:58:23 +01:00
Aleksander Grygier c39c6ef436 fix: System prompt sorting 2026-01-24 13:44:41 +01:00
Aleksander Grygier 2601bf0f59 fix: Save draft message in Chat Form when adding System Prompt from new chat view 2026-01-24 13:32:49 +01:00
Aleksander Grygier a647edfc0b fix: Chat Form submission 2026-01-24 12:33:24 +01:00
Johannes Gäßler 8f91ca54ec
CUDA: re-use MLA K data for V in MMA FA (#19057) 2026-01-24 10:09:36 +01:00
Aman Gupta 81ab64f3c8
ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934)
* ggml-cuda: add split-wise cuda graph

* add n-cpu-moe compare_llama_bench.py

* fix hip/musa builds
2026-01-24 14:25:20 +08:00
nullname 8af1f5f430
ggml-hexagon: flash-attn opt (#19025)
* optimize flash attention kernel by improving score computation and online softmax update

* wip

* Refactor online softmax update in flash attention kernel for improved performance

* Optimize flash attention kernel by replacing float array with HVX_Vector for score computation

* wip
2026-01-23 22:02:07 -08:00
Aleksander Grygier bd16b6145c chore: update webui build output 2026-01-24 01:32:36 +01:00
Aleksander Grygier 8428741034 feat: MCP Prompts WIP 2026-01-24 01:26:17 +01:00
Georgi Gerganov 557515be1e
graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898)
* graph : avoid branches between embedding and token inputs

* models : make deepstack graphs (e.g. Qwen3 VL) have constant topology

* ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI

* cont : pad token embeddings to n_embd_inp
2026-01-23 18:22:34 +02:00
Aleksander Grygier 3d88d0b6b2 chore: update webui build output 2026-01-23 15:21:56 +01:00
Aleksander Grygier 9c391d8e0d feat: UI improvements 2026-01-23 15:21:03 +01:00
Neo Zhang cb6caca191
[SYCL] use malloc to support both iGPU and dGPU in same time (#18992)
* use malloc to support both iGPU and dGPU in same time

* support windows

---------

Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2026-01-23 20:54:10 +08:00
Xuan-Son Nguyen b5b8fa1c8b
chat : fix translategemma crash on common_chat_format_example (#19019) 2026-01-23 12:03:42 +01:00
Daniel Bevenius a14b960bc7
model-conversion : use BUILD_DIR variable in all scripts (#19015)
This commit modifies all the utility scripts to use an optional
BUILD_DIR variable/argument to specify the build directory.

The motivation for this is that Commit
3d55846a5c ("model-conversion : add
BUILD_DIR variable to run-converted-model scripts") introduced this
variable to the causal and embeddings scripts, but I missed the scripts
in the utils directory.
2026-01-23 09:01:36 +01:00
Alberto Cabrera Pérez 091a46cb8d
ggml-cpu: aarm64: q5_K repack gemm and gemv (and generic) implementations (i8mm) (#18860)
* Boilerplate for q5_Kx8 REPACK on ARM and fallback

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Implements make_block_q5_Kx8 by extending make_block_q4_Kx8

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* q5_K repack gemm and gemv generics

* Gemm and Gemv ARM implementations (i8mm)

* Improved qh manipulation looking at non-repack vec_dot implementation

* Full unroll

* Apply Q5_K Gemv vand and vshl optimizations to gemm. Improve comments.

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Fix wrong fallback definitions of Q5_K

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Fixed comments. Reverted unnecessary formatting

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* Fixed typo in generic definitions

* Switching AND + Shift with Shift Insert. Better op interleaving.

* Vectorize + unroll the block scales

* Apply gemm optimizations to gemv

* Improve bias calculation

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
2026-01-23 09:55:08 +02:00
Aldehir Rojas a3e812811d
cli : load parser definition (#19031)
* cli : load parser definition

* cont : only unload if a parser is defined
2026-01-22 20:31:22 -06:00
Xuan-Son Nguyen 51fa458a92
server : support preserving reasoning_content in assistant message (#18994)
* support reasoning_content input

* report template caps to webui

* add docs

* rm commented code
2026-01-22 21:30:06 +01:00
Georgi Gerganov a5eaa1d6a3
mla : make the V tensor a view of K (#18986)
* mla : pass V as a view of K to the FA op

* cuda : adjust mla logic to new layout

* kv-cache : fix rope shift

* tests : remove comment

* cuda : fix reusable_cutoff

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-01-22 22:09:01 +02:00
Johannes Gäßler e2baf02162
CUDA: fix alignment check for FA (#19023) 2026-01-22 20:39:25 +01:00