Commit Graph

8112 Commits

Author SHA1 Message Date
Aleksander Grygier aff13cc085 refactor: Go back to simpler Stores + Services architecture 2026-01-27 15:57:12 +01:00
Aleksander Grygier f7b7ae467e feat: Introduce BaseClient for common store integration
refactor(agentic-client): Extend BaseClient for store integration
refactor(chat-client): Extend BaseClient for store integration
refactor(conversations-client): Extend BaseClient for store integration
2026-01-27 15:27:30 +01:00
Aleksander Grygier ace0de145a feat: Introduce centralized API fetch utilities
refactor(models): Use new API fetch utilities
refactor(props): Use new API fetch utilities
2026-01-27 15:27:29 +01:00
Aleksander Grygier 948278d663 fix: Missing tool call handling 2026-01-27 15:11:06 +01:00
Aleksander Grygier f40b377e34 refactor: Improves abort signal handling 2026-01-27 14:55:35 +01:00
David Lima 68ac3acb43
docs: Remove duplicated word on CUDA build section (#19136) 2026-01-27 14:48:51 +01:00
Aleksander Grygier 55e73cdde8 chore: update webui build output 2026-01-27 14:29:20 +01:00
Johannes Gäßler a5bb8ba4c5
CUDA: tune GLM 4.7 Flash FA kernel selection logic (#19097) 2026-01-27 14:28:56 +01:00
Aleksander Grygier 7ba1b458d5 refactor: Create shared ActiveConversationStore to avoid circular dependency between ChatStore and ConversationsStore 2026-01-27 14:27:13 +01:00
Aleksander Grygier 9cce846f32 chore: update webui build output 2026-01-27 14:01:34 +01:00
Aleksander Grygier 6e7b3385a2 feat: Enhance ChatMessageMcpPromptContent display 2026-01-27 13:47:18 +01:00
Aleksander Grygier 8219404122 feat: Disable server card toggle when in error state 2026-01-27 13:47:18 +01:00
Aleksander Grygier 738ccd8a52 feat: Add auto-resizing textarea to KeyValuePairs component 2026-01-27 13:47:18 +01:00
Aleksander Grygier f09eeed040 chore: update webui build output 2026-01-27 13:13:56 +01:00
Aleksander Grygier 70f96c96b6 refactor: Remove unused `getChatActionsContext` import 2026-01-27 13:10:24 +01:00
Aleksander Grygier d43895d706 feat: Implement inactive chat conversation state cleanup 2026-01-27 13:10:24 +01:00
Aleksander Grygier 2281ac50c6 refactor: Use TTL cache for model properties in ModelsStore 2026-01-27 13:10:24 +01:00
Aleksander Grygier 2e2cb3d210 feat: Implement generic TTL cache utility 2026-01-27 13:10:24 +01:00
Aleksander Grygier 80ab2a5d1f feat: Add cache configuration constants 2026-01-27 13:10:24 +01:00
Aleksander Grygier 8421d056be chore: update webui build output 2026-01-27 13:01:12 +01:00
Aleksander Grygier 25df25a126 refactor: Adapt message child components to MessageEditContext 2026-01-27 13:00:37 +01:00
Aleksander Grygier 93992b10a7 refactor: Encapsulate message editing state and actions in ChatMessage.svelte 2026-01-27 13:00:37 +01:00
Aleksander Grygier cbcd7956c8 refactor: Centralize chat-wide actions in ChatMessages.svelte 2026-01-27 13:00:36 +01:00
Aleksander Grygier 6b6ebd6bca feat: Introduce Chat Actions and Message Edit Contexts 2026-01-27 13:00:36 +01:00
Aleksander Grygier 357fd8d591 chore: update webui build output 2026-01-27 12:23:47 +01:00
Aleksander Grygier 6cf823fb92 refactor: Components 2026-01-27 12:20:16 +01:00
Aleksander Grygier 8a8cd78237 refactor: Improve styling and overflow handling for ChatMessageMcpPromptContent 2026-01-27 11:56:55 +01:00
Aleksander Grygier 8ca3ffa076 feat: Add support for pasting MCP prompt attachments in ChatForm 2026-01-27 11:56:55 +01:00
Aleksander Grygier 770f993086 feat: Implement clipboard serialization/deserialization for MCP prompts 2026-01-27 11:56:55 +01:00
Aleksander Grygier 99d177d442 feat: Introduce clipboard types for MCP prompt attachments 2026-01-27 11:56:55 +01:00
Sigbjørn Skjæret c0204a0893
ci : revert slim runner for winget (#19129) 2026-01-27 11:54:25 +01:00
Aleksander Grygier 69682dcb1a fix: Edit Mode with MCP Prompt in message 2026-01-27 11:30:44 +01:00
Aleksander Grygier f22e2be4d0 refactor: Use Popover for Chat Form Prompt Picker 2026-01-27 11:22:30 +01:00
Aleksander Grygier 7eff7a31de feat: UI improvements 2026-01-27 11:07:20 +01:00
Aleksander Grygier d4a6815ea9 chore: update webui build output 2026-01-27 10:40:34 +01:00
Aleksander Grygier b834f165a4 Merge remote-tracking branch 'origin/allozaur/mcp-mvp' into allozaur/mcp-mvp 2026-01-27 10:40:11 +01:00
Aleksander Grygier e35adedb4f chore: update webui build output 2026-01-27 10:27:40 +01:00
Aleksander Grygier 1b7f576baf refactor: Components 2026-01-27 10:26:14 +01:00
Alberto Cabrera Pérez be8890e721
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888)
* Boilerplate for q6_K repack

* q6_K repack to q6_Kx8 implementation

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* q6_K generic gemv and gemm

* wip, gemm_q6_K 8x8

* Still WIP: loading of q8s, q6h and q6l

* first working version of q6_K gemm

* Moved q6 loads outside of sb block, Unrolled inner loop

* Replaced modulo with mask

* First implementation of GEMV

* ggml_vdotq_s32 -> vdotq_s32

* Reduce width of accumulators in q6_K gemv

* Bsums instead of calc bias. Preload scales to use vget_lane. Unroll.

* Reuse scales in GEMM (same GEMV opt)

* Added todos for bsum and different qh repack

* Arch fallback

* VSLIQ for merging qh adn ql

* Removed TODO, already tested

* Apply suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Removed unused import

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-27 11:08:10 +02:00
Aleksander Grygier b8221e8915 refactor: Utils 2026-01-27 09:04:41 +01:00
Gaurav Garg a83c73a18a
[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full (#19042)
* [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full

With pipeline parallelism, during prompt processing, the CPU-side CUDA command buffer gets full, stalling the CPU. Due to this, enough work doesn't get submitted to the GPU, causing bubbles in the GPU timeline.
Fix this by setting the CUDA environment variable CUDA_SCALE_LAUNCH_QUEUES to 4x to increase the command buffer size.

* Set the env variable in the CUDA backend registry allocation

* Add link to PR in code comment

* Remove warning logs and update documentation
2026-01-27 08:52:44 +02:00
Daniel Bevenius fc3cdf32ce
common : clarify HTTPS build options in error message (#19103)
* common : clarify HTTPS build options in error message

This commit updates the https error message to provide clearer
instructions for users who encounter the "HTTPS is not supported" error.

The motivation for this is that it might not be clear to users that only
one of these options are needed to enable HTTPS support.
The LLAMA_OPENSSL option is also added to the message to cover all
possible build configurations.

* clarify that OpenSSL is the default for HTTPS support
2026-01-27 06:16:00 +01:00
shalinib-ibm 7afdfc9b84
ggml-cpu: Enable FP16 MMA kernels on PPC (#19060) 2026-01-27 11:52:34 +08:00
lhez 94eeb5967c
opencl: add flattened q6_K mv (#19054)
* opencl: flatten `q6_K` and add `kernel_mul_mv_q6_K_f32_flat`

* opencl: clean up

* opencl: refactor q6_K mv - put loop body in `block_q_6_K_dot_y_flat`

* opencl: tweak the workgroup size a bit

* opencl: output 4 values per subgroup for `kernel_mul_mv_q6_K_f32_flat`

* opencl: proper alignment for q6_K

* opencl: boundary handling for flattened q6_K mv

* opencl: rename q6_K mv kernel file

* opencl: put flattened q6_K mv in its own file

* opencl: use lower k in file name

* opencl: use K in variable names
2026-01-26 19:36:24 -08:00
Johannes Gäßler b0311c16d2
CUDA: fix padding of GQA to power of 2 in FA (#19115) 2026-01-26 23:24:58 +01:00
Georgi Gerganov 8f80d1b254
graph : fix nkvo offload with FA (#19105) 2026-01-26 20:18:34 +02:00
Pascal 5e71525cac webui: remove unused sessionId, SDK handles it automatically 2026-01-26 16:41:44 +01:00
Pascal 19c32a4c96 webui: remove unused sessionId, SDK handles it automatically 2026-01-26 16:13:07 +01:00
Aleksander Grygier d444c4a7e5 chore: update webui build output 2026-01-26 15:40:02 +01:00
Aleksander Grygier 1d518cac06 fix: Wait for all MCP Servers Health Checks to load 2026-01-26 15:38:10 +01:00