llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aleksander Grygier	aff13cc085	refactor: Go back to simpler Stores + Services architecture	2026-01-27 15:57:12 +01:00
Aleksander Grygier	f7b7ae467e	feat: Introduce BaseClient for common store integration refactor(agentic-client): Extend BaseClient for store integration refactor(chat-client): Extend BaseClient for store integration refactor(conversations-client): Extend BaseClient for store integration	2026-01-27 15:27:30 +01:00
Aleksander Grygier	ace0de145a	feat: Introduce centralized API fetch utilities refactor(models): Use new API fetch utilities refactor(props): Use new API fetch utilities	2026-01-27 15:27:29 +01:00
Aleksander Grygier	948278d663	fix: Missing tool call handling	2026-01-27 15:11:06 +01:00
Aleksander Grygier	f40b377e34	refactor: Improves abort signal handling	2026-01-27 14:55:35 +01:00
David Lima	68ac3acb43	docs: Remove duplicated word on CUDA build section (#19136 )	2026-01-27 14:48:51 +01:00
Aleksander Grygier	55e73cdde8	chore: update webui build output	2026-01-27 14:29:20 +01:00
Johannes Gäßler	a5bb8ba4c5	CUDA: tune GLM 4.7 Flash FA kernel selection logic (#19097 )	2026-01-27 14:28:56 +01:00
Aleksander Grygier	7ba1b458d5	refactor: Create shared ActiveConversationStore to avoid circular dependency between ChatStore and ConversationsStore	2026-01-27 14:27:13 +01:00
Aleksander Grygier	9cce846f32	chore: update webui build output	2026-01-27 14:01:34 +01:00
Aleksander Grygier	6e7b3385a2	feat: Enhance ChatMessageMcpPromptContent display	2026-01-27 13:47:18 +01:00
Aleksander Grygier	8219404122	feat: Disable server card toggle when in error state	2026-01-27 13:47:18 +01:00
Aleksander Grygier	738ccd8a52	feat: Add auto-resizing textarea to KeyValuePairs component	2026-01-27 13:47:18 +01:00
Aleksander Grygier	f09eeed040	chore: update webui build output	2026-01-27 13:13:56 +01:00
Aleksander Grygier	70f96c96b6	refactor: Remove unused `getChatActionsContext` import	2026-01-27 13:10:24 +01:00
Aleksander Grygier	d43895d706	feat: Implement inactive chat conversation state cleanup	2026-01-27 13:10:24 +01:00
Aleksander Grygier	2281ac50c6	refactor: Use TTL cache for model properties in ModelsStore	2026-01-27 13:10:24 +01:00
Aleksander Grygier	2e2cb3d210	feat: Implement generic TTL cache utility	2026-01-27 13:10:24 +01:00
Aleksander Grygier	80ab2a5d1f	feat: Add cache configuration constants	2026-01-27 13:10:24 +01:00
Aleksander Grygier	8421d056be	chore: update webui build output	2026-01-27 13:01:12 +01:00
Aleksander Grygier	25df25a126	refactor: Adapt message child components to MessageEditContext	2026-01-27 13:00:37 +01:00
Aleksander Grygier	93992b10a7	refactor: Encapsulate message editing state and actions in ChatMessage.svelte	2026-01-27 13:00:37 +01:00
Aleksander Grygier	cbcd7956c8	refactor: Centralize chat-wide actions in ChatMessages.svelte	2026-01-27 13:00:36 +01:00
Aleksander Grygier	6b6ebd6bca	feat: Introduce Chat Actions and Message Edit Contexts	2026-01-27 13:00:36 +01:00
Aleksander Grygier	357fd8d591	chore: update webui build output	2026-01-27 12:23:47 +01:00
Aleksander Grygier	6cf823fb92	refactor: Components	2026-01-27 12:20:16 +01:00
Aleksander Grygier	8a8cd78237	refactor: Improve styling and overflow handling for ChatMessageMcpPromptContent	2026-01-27 11:56:55 +01:00
Aleksander Grygier	8ca3ffa076	feat: Add support for pasting MCP prompt attachments in ChatForm	2026-01-27 11:56:55 +01:00
Aleksander Grygier	770f993086	feat: Implement clipboard serialization/deserialization for MCP prompts	2026-01-27 11:56:55 +01:00
Aleksander Grygier	99d177d442	feat: Introduce clipboard types for MCP prompt attachments	2026-01-27 11:56:55 +01:00
Sigbjørn Skjæret	c0204a0893	ci : revert slim runner for winget (#19129 )	2026-01-27 11:54:25 +01:00
Aleksander Grygier	69682dcb1a	fix: Edit Mode with MCP Prompt in message	2026-01-27 11:30:44 +01:00
Aleksander Grygier	f22e2be4d0	refactor: Use Popover for Chat Form Prompt Picker	2026-01-27 11:22:30 +01:00
Aleksander Grygier	7eff7a31de	feat: UI improvements	2026-01-27 11:07:20 +01:00
Aleksander Grygier	d4a6815ea9	chore: update webui build output	2026-01-27 10:40:34 +01:00
Aleksander Grygier	b834f165a4	Merge remote-tracking branch 'origin/allozaur/mcp-mvp' into allozaur/mcp-mvp	2026-01-27 10:40:11 +01:00
Aleksander Grygier	e35adedb4f	chore: update webui build output	2026-01-27 10:27:40 +01:00
Aleksander Grygier	1b7f576baf	refactor: Components	2026-01-27 10:26:14 +01:00
Alberto Cabrera Pérez	be8890e721	ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888 ) * Boilerplate for q6_K repack * q6_K repack to q6_Kx8 implementation Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * q6_K generic gemv and gemm * wip, gemm_q6_K 8x8 * Still WIP: loading of q8s, q6h and q6l * first working version of q6_K gemm * Moved q6 loads outside of sb block, Unrolled inner loop * Replaced modulo with mask * First implementation of GEMV * ggml_vdotq_s32 -> vdotq_s32 * Reduce width of accumulators in q6_K gemv * Bsums instead of calc bias. Preload scales to use vget_lane. Unroll. * Reuse scales in GEMM (same GEMV opt) * Added todos for bsum and different qh repack * Arch fallback * VSLIQ for merging qh adn ql * Removed TODO, already tested * Apply suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Removed unused import --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-01-27 11:08:10 +02:00
Aleksander Grygier	b8221e8915	refactor: Utils	2026-01-27 09:04:41 +01:00
Gaurav Garg	a83c73a18a	[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full (#19042 ) * [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full With pipeline parallelism, during prompt processing, the CPU-side CUDA command buffer gets full, stalling the CPU. Due to this, enough work doesn't get submitted to the GPU, causing bubbles in the GPU timeline. Fix this by setting the CUDA environment variable CUDA_SCALE_LAUNCH_QUEUES to 4x to increase the command buffer size. * Set the env variable in the CUDA backend registry allocation * Add link to PR in code comment * Remove warning logs and update documentation	2026-01-27 08:52:44 +02:00
Daniel Bevenius	fc3cdf32ce	common : clarify HTTPS build options in error message (#19103 ) * common : clarify HTTPS build options in error message This commit updates the https error message to provide clearer instructions for users who encounter the "HTTPS is not supported" error. The motivation for this is that it might not be clear to users that only one of these options are needed to enable HTTPS support. The LLAMA_OPENSSL option is also added to the message to cover all possible build configurations. * clarify that OpenSSL is the default for HTTPS support	2026-01-27 06:16:00 +01:00
shalinib-ibm	7afdfc9b84	ggml-cpu: Enable FP16 MMA kernels on PPC (#19060 )	2026-01-27 11:52:34 +08:00
lhez	94eeb5967c	opencl: add flattened q6_K mv (#19054 ) * opencl: flatten `q6_K` and add `kernel_mul_mv_q6_K_f32_flat` * opencl: clean up * opencl: refactor q6_K mv - put loop body in `block_q_6_K_dot_y_flat` * opencl: tweak the workgroup size a bit * opencl: output 4 values per subgroup for `kernel_mul_mv_q6_K_f32_flat` * opencl: proper alignment for q6_K * opencl: boundary handling for flattened q6_K mv * opencl: rename q6_K mv kernel file * opencl: put flattened q6_K mv in its own file * opencl: use lower k in file name * opencl: use K in variable names	2026-01-26 19:36:24 -08:00
Johannes Gäßler	b0311c16d2	CUDA: fix padding of GQA to power of 2 in FA (#19115 )	2026-01-26 23:24:58 +01:00
Georgi Gerganov	8f80d1b254	graph : fix nkvo offload with FA (#19105 )	2026-01-26 20:18:34 +02:00
Pascal	5e71525cac	webui: remove unused sessionId, SDK handles it automatically	2026-01-26 16:41:44 +01:00
Pascal	19c32a4c96	webui: remove unused sessionId, SDK handles it automatically	2026-01-26 16:13:07 +01:00
Aleksander Grygier	d444c4a7e5	chore: update webui build output	2026-01-26 15:40:02 +01:00
Aleksander Grygier	1d518cac06	fix: Wait for all MCP Servers Health Checks to load	2026-01-26 15:38:10 +01:00

1 2 3 4 5 ...

8112 Commits All Branches Search

8112 Commits

All Branches