Commit Graph

8137 Commits

Author SHA1 Message Date
Aleksander Grygier 6cf823fb92 refactor: Components 2026-01-27 12:20:16 +01:00
Aleksander Grygier 8a8cd78237 refactor: Improve styling and overflow handling for ChatMessageMcpPromptContent 2026-01-27 11:56:55 +01:00
Aleksander Grygier 8ca3ffa076 feat: Add support for pasting MCP prompt attachments in ChatForm 2026-01-27 11:56:55 +01:00
Aleksander Grygier 770f993086 feat: Implement clipboard serialization/deserialization for MCP prompts 2026-01-27 11:56:55 +01:00
Aleksander Grygier 99d177d442 feat: Introduce clipboard types for MCP prompt attachments 2026-01-27 11:56:55 +01:00
Sigbjørn Skjæret c0204a0893
ci : revert slim runner for winget (#19129) 2026-01-27 11:54:25 +01:00
Aleksander Grygier 69682dcb1a fix: Edit Mode with MCP Prompt in message 2026-01-27 11:30:44 +01:00
Aleksander Grygier f22e2be4d0 refactor: Use Popover for Chat Form Prompt Picker 2026-01-27 11:22:30 +01:00
Aleksander Grygier 7eff7a31de feat: UI improvements 2026-01-27 11:07:20 +01:00
Aleksander Grygier d4a6815ea9 chore: update webui build output 2026-01-27 10:40:34 +01:00
Aleksander Grygier b834f165a4 Merge remote-tracking branch 'origin/allozaur/mcp-mvp' into allozaur/mcp-mvp 2026-01-27 10:40:11 +01:00
Aleksander Grygier e35adedb4f chore: update webui build output 2026-01-27 10:27:40 +01:00
Aleksander Grygier 1b7f576baf refactor: Components 2026-01-27 10:26:14 +01:00
Alberto Cabrera Pérez be8890e721
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888)
* Boilerplate for q6_K repack

* q6_K repack to q6_Kx8 implementation

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* q6_K generic gemv and gemm

* wip, gemm_q6_K 8x8

* Still WIP: loading of q8s, q6h and q6l

* first working version of q6_K gemm

* Moved q6 loads outside of sb block, Unrolled inner loop

* Replaced modulo with mask

* First implementation of GEMV

* ggml_vdotq_s32 -> vdotq_s32

* Reduce width of accumulators in q6_K gemv

* Bsums instead of calc bias. Preload scales to use vget_lane. Unroll.

* Reuse scales in GEMM (same GEMV opt)

* Added todos for bsum and different qh repack

* Arch fallback

* VSLIQ for merging qh adn ql

* Removed TODO, already tested

* Apply suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Removed unused import

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-27 11:08:10 +02:00
Aleksander Grygier b8221e8915 refactor: Utils 2026-01-27 09:04:41 +01:00
Gaurav Garg a83c73a18a
[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full (#19042)
* [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full

With pipeline parallelism, during prompt processing, the CPU-side CUDA command buffer gets full, stalling the CPU. Due to this, enough work doesn't get submitted to the GPU, causing bubbles in the GPU timeline.
Fix this by setting the CUDA environment variable CUDA_SCALE_LAUNCH_QUEUES to 4x to increase the command buffer size.

* Set the env variable in the CUDA backend registry allocation

* Add link to PR in code comment

* Remove warning logs and update documentation
2026-01-27 08:52:44 +02:00
Daniel Bevenius fc3cdf32ce
common : clarify HTTPS build options in error message (#19103)
* common : clarify HTTPS build options in error message

This commit updates the https error message to provide clearer
instructions for users who encounter the "HTTPS is not supported" error.

The motivation for this is that it might not be clear to users that only
one of these options are needed to enable HTTPS support.
The LLAMA_OPENSSL option is also added to the message to cover all
possible build configurations.

* clarify that OpenSSL is the default for HTTPS support
2026-01-27 06:16:00 +01:00
shalinib-ibm 7afdfc9b84
ggml-cpu: Enable FP16 MMA kernels on PPC (#19060) 2026-01-27 11:52:34 +08:00
lhez 94eeb5967c
opencl: add flattened q6_K mv (#19054)
* opencl: flatten `q6_K` and add `kernel_mul_mv_q6_K_f32_flat`

* opencl: clean up

* opencl: refactor q6_K mv - put loop body in `block_q_6_K_dot_y_flat`

* opencl: tweak the workgroup size a bit

* opencl: output 4 values per subgroup for `kernel_mul_mv_q6_K_f32_flat`

* opencl: proper alignment for q6_K

* opencl: boundary handling for flattened q6_K mv

* opencl: rename q6_K mv kernel file

* opencl: put flattened q6_K mv in its own file

* opencl: use lower k in file name

* opencl: use K in variable names
2026-01-26 19:36:24 -08:00
Johannes Gäßler b0311c16d2
CUDA: fix padding of GQA to power of 2 in FA (#19115) 2026-01-26 23:24:58 +01:00
Georgi Gerganov 8f80d1b254
graph : fix nkvo offload with FA (#19105) 2026-01-26 20:18:34 +02:00
Pascal 5e71525cac webui: remove unused sessionId, SDK handles it automatically 2026-01-26 16:41:44 +01:00
Pascal 19c32a4c96 webui: remove unused sessionId, SDK handles it automatically 2026-01-26 16:13:07 +01:00
Aleksander Grygier d444c4a7e5 chore: update webui build output 2026-01-26 15:40:02 +01:00
Aleksander Grygier 1d518cac06 fix: Wait for all MCP Servers Health Checks to load 2026-01-26 15:38:10 +01:00
Aleksander Grygier 82f26ad8e4 refactor: Cleanup 2026-01-26 15:33:27 +01:00
Aleksander Grygier 5bf1c86635 refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
2026-01-26 15:28:50 +01:00
Sigbjørn Skjæret 142cbe2ac6
ci : use new 1vCPU runner for lightweight jobs (#19107)
* use new 1vCPU runner for lightweight jobs

* pyright is too heavy, look into ty some day

use new pip-install input
2026-01-26 15:22:49 +01:00
Aleksander Grygier 7b127db90c chore: update webui build output 2026-01-26 15:07:47 +01:00
Aleksander Grygier 717a868c23 feat: Mcp Server Selector 2026-01-26 15:03:05 +01:00
Aleksander Grygier e566d6641e fix: Scroll issues in DropdownMenuSearchable 2026-01-26 14:41:15 +01:00
Aleksander Grygier d675f403e3 chore: update webui build output 2026-01-26 14:33:58 +01:00
Aleksander Grygier ee0f0b277f feat: Improve Code blocks rendering + add auto scroll + improve global scroll bar behavior 2026-01-26 14:32:40 +01:00
Aleksander Grygier 6586ae71d2 chore: update webui build output 2026-01-26 12:34:21 +01:00
Aleksander Grygier c631e26a3f refactor: Components imports/exports structure & documentation 2026-01-26 12:30:53 +01:00
Georgi Gerganov 56f3ebf38e
model : add correct type for GLM 4.7 Flash (#19106) 2026-01-26 11:24:30 +02:00
Aleksander Grygier b7d1de68c3 refactor: Cleanup 2026-01-26 09:54:44 +01:00
Aleksander Grygier 0a66568fc9 chore: update webui build output 2026-01-26 09:37:27 +01:00
Aleksander Grygier fa0cad2e6e refactor: Componentize Chat Form Prompt Picker 2026-01-26 09:36:13 +01:00
Aleksander Grygier 176abf3175 refactor: Utility function 2026-01-26 09:00:41 +01:00
Aleksander Grygier 5ee232d81c refactor: Use store methods 2026-01-26 08:52:57 +01:00
Johannes Gäßler 0c21677e43
CUDA: faster FA for GQA > 1 but not power of 2 (#19092) 2026-01-25 21:19:47 +01:00
ccbinn 0440bfd160
metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/macOS (#19088)
Co-authored-by: chenbin11 <chenbin11@kuaishou.com>
2026-01-25 20:07:19 +02:00
Sigbjørn Skjæret 0bf5636938
convert : yield Gemma3N custom_map tensors directly (#19091) 2026-01-25 18:03:34 +01:00
Aman Gupta bcb43163ae
ggml-cpu: Use tiled FA for prompt-processing (#19012)
* ggml-cpu: Use tiled FA for prompt-processing

the FA performance is gimped on CPU on long contexts because it essentially uses a vector kernel. This PR adds a tiled FA for PP. Perf tuning for tile sizes done on a AMD EPYC single-socket 64-c machine.

* fix out of bounds for mask

* skip rows where there are all masks

* skip tile if mask is inf

* store mask in worksize

* check inf tile earlier
2026-01-25 23:25:58 +08:00
Georgi Gerganov d9c6ce46f7
kv-cache : support V-less cache (#19067)
* kv-cache : support V-less cache

* cuda : better check for V_is_K_view

* cuda : improve V_is_K_view check

* graph : add comments

* hparams : refactor
2026-01-25 15:48:56 +02:00
Aleksander Grygier ff0e927be2 chore: update webui build output 2026-01-25 13:38:25 +01:00
Aleksander Grygier ee9efae203 refactor: Enums 2026-01-25 13:37:08 +01:00
Aleksander Grygier 7f5284d597 refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
2026-01-25 13:13:11 +01:00
Sigbjørn Skjæret 70d860824a
convert : fix Gemma3N, GraniteMoe and Ernie4.5Moe (#19084)
* fix Gemma3N and Ernie4.5Moe

* fix GraniteMoe
2026-01-25 13:05:05 +01:00