llama.cpp

Commit Graph

Author	SHA1	Message	Date
Daniel Bevenius	eef375ce16	sampling : remove sampling branching in output_reserve (#18811 ) * sampling : remove sampling branching in output_reserve This commit updates output_reserve in llama-context.cpp to always allocate sampling buffers regardless of whether sampling is needed for the current batch. The motivation for this is to avoid reallocations and branching based on the sampling requirements of the batch.	2026-01-28 05:59:30 +01:00
Nikhil Jain	06961e2876	ggml webgpu: Split shared state (webgpu_context) into global state and per-thread state (#18976 ) * Squashed commit of the following: commit b3c6bf4b0450d8d452b934df27a0fb7cb53cd755 Author: Abhijit Ramesh <abhijitramesh2k@gmail.com> Date: Mon Dec 1 18:29:00 2025 -0800 ggml webgpu: fix xielu parameter passing (#11) The XIELU operation was incorrectly using static_cast to convert float parameters to uint32_t, which converted numeric values instead of preserving IEEE 754 bit patterns. This caused incorrect values to be interpreted by the GPU shader. * Use reinterpret_cast to preserve float bit patterns when passing through uint32_t params buffer * Update WGSL shader parameter types from u32 to f32 * Re-enable XIELU support (was disabled due to numerical issues) Fixes NMSE test failures for XIELU operation on WebGPU backend. commit `5ca9b5e49e` Author: neha-ha <137219201+neha-ha@users.noreply.github.com> Date: Tue Nov 18 12:17:00 2025 -0800 Refactored pipelines and workgroup calculations (#10) * refactored pipelines * refactored workgroup calculation * removed commented out block of prior maps * Clean up ceiling division pattern --------- Co-authored-by: Neha Abbas <nehaabbas@eduroam-169-233-141-223.ucsc.edu> Co-authored-by: Reese Levine <reeselevine1@gmail.com> Author: James Contini <jamescontini@gmail.com> Date: Wed Oct 29 23:13:06 2025 -0700 formatted embed wgsl and ggml-webgpu.cpp commit `e1f6baea31` Author: James Contini <jamescontini@gmail.com> Date: Wed Oct 29 23:08:37 2025 -0700 implemented REPL_Template support and removed bug in unary operators kernel commit `8c70b8fece` Author: James Contini <jamescontini@gmail.com> Date: Wed Oct 15 16:14:20 2025 -0700 responded and dealt with PR comments commit `f9282c660c` Author: James Contini <jamescontini@gmail.com> Date: Sun Oct 12 13:41:41 2025 -0700 removed unnecesarry checking if node->src[1] exists for unary operators commit `4cf28d7dec` Author: James Contini <jamescontini@gmail.com> Date: Sun Oct 12 13:32:45 2025 -0700 All operators (inlcluding xielu) working commit `74c6add176` Author: James Contini <jamescontini@gmail.com> Date: Fri Oct 10 13:16:48 2025 -0700 fixed autoconfig commit `362749910b` Author: James Contini <jamescontini@gmail.com> Date: Fri Oct 10 13:10:46 2025 -0700 removed vestigial files commit `cb08583337` Author: James Contini <jamescontini@gmail.com> Date: Fri Oct 10 12:59:32 2025 -0700 abides by editor-config commit `5360e2852a` Author: James Contini <jamescontini@gmail.com> Date: Fri Oct 10 12:45:57 2025 -0700 rms_norm double declaration bug atoned commit `7b09baa4aa` Merge: `8a6ec843` `74b8fc17` Author: James Contini <jamescontini@gmail.com> Date: Fri Oct 10 11:50:03 2025 -0700 resolving merge conflicts commit `8a6ec843a5` Author: James Contini <jamescontini@gmail.com> Date: Wed Oct 8 18:06:47 2025 -0700 unary operators pass ggml tests commit `c3ae38278a` Author: James Contini <jamescontini@gmail.com> Date: Wed Oct 1 16:22:40 2025 -0700 neg passes backend test commit `aa1c9b2f88` Author: James Contini <jamescontini@gmail.com> Date: Tue Sep 30 23:55:27 2025 -0700 neg f16xf32xip builds and runs, havent actually ran a model that uses neg kernel yet though Co-authored-by: James Contini <jamescontini@gmail.com> Co-authored-by: Neha Abbas <neabbas@ucsc.edu> Co-authored-by: Abhijit Ramesh <abhijitramesh2k@gmail.com> * Remove extra code and format * Add ops documentation (finally) * ggml webgpu: add SOFTPLUS unary operator Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32 precision for intermediate calculations to prevent f16 overflow. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * Follow Vulkan backend numerical stability pattern * ggml webgpu: add EXPM1 unary operator Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add FLOOR unary operator Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add CEIL unary operator Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add ROUND unary operator Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add TRUNC unary operator Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS) * Updates to webgpu get_memory * Move shared state (webgpu_context) and device creation out of registration context, device context, and buffer context, and move into backend context * Small cleanup * Move Instance, Device, Adapter, Device creation, and capabilities to global state while moving Queue, pipelines, and buffers to per-thread state. * Cleanups * More cleanup * Move staging_buf mutex to global context * Resolve merge * Resolve merge * Resolve merge * Clean up merge errors, delete forward declaration, and run clang-format * Rename device_init to backend_init * Move webgpu_context to backend_context * Move buffer context members into global context and refactor function calls * Run clang-format * Remove commends * Move parameter buffers to per-thread, add single memset_tensor param buf * Fix CI compilation issue * Fix builds for emscripten not supporting subgroups * cleanup * cleanup --------- Co-authored-by: Reese Levine <reeselevine1@gmail.com>	2026-01-27 20:53:36 -08:00
Vishal Singh	f2571df8b7	ggml-zendnn : update ZenDNN git tag to main branch (#19133 )	2026-01-28 06:21:36 +08:00
Sigbjørn Skjæret	2b4cbd2834	jinja : implement mixed type object keys (#18955 ) * implement mixed type object keys * add tests * refactor * minor fixes * massive refactor * add more tests * forgotten tuples * fix array/object is_hashable * correct (albeit broken) jinja responses verified with transformers * improved hashing and equality * refactor hash function * more exhausive test case * clean up * cont * cont (2) * missing cstring --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-01-27 19:50:42 +01:00
Aleksander Grygier	bdae58ceb8	refactor: Reuse MCP connections for health checks	2026-01-27 17:13:09 +01:00
Aleksander Grygier	0779dff7ca	chore: update webui build output	2026-01-27 17:03:59 +01:00
Aleksander Grygier	fcb7d1f899	fix: Sync streaming content to active messages	2026-01-27 16:46:19 +01:00
Aleksander Grygier	aff13cc085	refactor: Go back to simpler Stores + Services architecture	2026-01-27 15:57:12 +01:00
Aleksander Grygier	f7b7ae467e	feat: Introduce BaseClient for common store integration refactor(agentic-client): Extend BaseClient for store integration refactor(chat-client): Extend BaseClient for store integration refactor(conversations-client): Extend BaseClient for store integration	2026-01-27 15:27:30 +01:00
Aleksander Grygier	ace0de145a	feat: Introduce centralized API fetch utilities refactor(models): Use new API fetch utilities refactor(props): Use new API fetch utilities	2026-01-27 15:27:29 +01:00
Aleksander Grygier	948278d663	fix: Missing tool call handling	2026-01-27 15:11:06 +01:00
Aleksander Grygier	f40b377e34	refactor: Improves abort signal handling	2026-01-27 14:55:35 +01:00
David Lima	68ac3acb43	docs: Remove duplicated word on CUDA build section (#19136 )	2026-01-27 14:48:51 +01:00
Aleksander Grygier	55e73cdde8	chore: update webui build output	2026-01-27 14:29:20 +01:00
Johannes Gäßler	a5bb8ba4c5	CUDA: tune GLM 4.7 Flash FA kernel selection logic (#19097 )	2026-01-27 14:28:56 +01:00
Aleksander Grygier	7ba1b458d5	refactor: Create shared ActiveConversationStore to avoid circular dependency between ChatStore and ConversationsStore	2026-01-27 14:27:13 +01:00
Aleksander Grygier	9cce846f32	chore: update webui build output	2026-01-27 14:01:34 +01:00
Aleksander Grygier	6e7b3385a2	feat: Enhance ChatMessageMcpPromptContent display	2026-01-27 13:47:18 +01:00
Aleksander Grygier	8219404122	feat: Disable server card toggle when in error state	2026-01-27 13:47:18 +01:00
Aleksander Grygier	738ccd8a52	feat: Add auto-resizing textarea to KeyValuePairs component	2026-01-27 13:47:18 +01:00
Aleksander Grygier	f09eeed040	chore: update webui build output	2026-01-27 13:13:56 +01:00
Aleksander Grygier	70f96c96b6	refactor: Remove unused `getChatActionsContext` import	2026-01-27 13:10:24 +01:00
Aleksander Grygier	d43895d706	feat: Implement inactive chat conversation state cleanup	2026-01-27 13:10:24 +01:00
Aleksander Grygier	2281ac50c6	refactor: Use TTL cache for model properties in ModelsStore	2026-01-27 13:10:24 +01:00
Aleksander Grygier	2e2cb3d210	feat: Implement generic TTL cache utility	2026-01-27 13:10:24 +01:00
Aleksander Grygier	80ab2a5d1f	feat: Add cache configuration constants	2026-01-27 13:10:24 +01:00
Aleksander Grygier	8421d056be	chore: update webui build output	2026-01-27 13:01:12 +01:00
Aleksander Grygier	25df25a126	refactor: Adapt message child components to MessageEditContext	2026-01-27 13:00:37 +01:00
Aleksander Grygier	93992b10a7	refactor: Encapsulate message editing state and actions in ChatMessage.svelte	2026-01-27 13:00:37 +01:00
Aleksander Grygier	cbcd7956c8	refactor: Centralize chat-wide actions in ChatMessages.svelte	2026-01-27 13:00:36 +01:00
Aleksander Grygier	6b6ebd6bca	feat: Introduce Chat Actions and Message Edit Contexts	2026-01-27 13:00:36 +01:00
Aleksander Grygier	357fd8d591	chore: update webui build output	2026-01-27 12:23:47 +01:00
Aleksander Grygier	6cf823fb92	refactor: Components	2026-01-27 12:20:16 +01:00
Aleksander Grygier	8a8cd78237	refactor: Improve styling and overflow handling for ChatMessageMcpPromptContent	2026-01-27 11:56:55 +01:00
Aleksander Grygier	8ca3ffa076	feat: Add support for pasting MCP prompt attachments in ChatForm	2026-01-27 11:56:55 +01:00
Aleksander Grygier	770f993086	feat: Implement clipboard serialization/deserialization for MCP prompts	2026-01-27 11:56:55 +01:00
Aleksander Grygier	99d177d442	feat: Introduce clipboard types for MCP prompt attachments	2026-01-27 11:56:55 +01:00
Sigbjørn Skjæret	c0204a0893	ci : revert slim runner for winget (#19129 )	2026-01-27 11:54:25 +01:00
Aleksander Grygier	69682dcb1a	fix: Edit Mode with MCP Prompt in message	2026-01-27 11:30:44 +01:00
Aleksander Grygier	f22e2be4d0	refactor: Use Popover for Chat Form Prompt Picker	2026-01-27 11:22:30 +01:00
Aleksander Grygier	7eff7a31de	feat: UI improvements	2026-01-27 11:07:20 +01:00
Aleksander Grygier	d4a6815ea9	chore: update webui build output	2026-01-27 10:40:34 +01:00
Aleksander Grygier	b834f165a4	Merge remote-tracking branch 'origin/allozaur/mcp-mvp' into allozaur/mcp-mvp	2026-01-27 10:40:11 +01:00
Aleksander Grygier	e35adedb4f	chore: update webui build output	2026-01-27 10:27:40 +01:00
Aleksander Grygier	1b7f576baf	refactor: Components	2026-01-27 10:26:14 +01:00
Alberto Cabrera Pérez	be8890e721	ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888 ) * Boilerplate for q6_K repack * q6_K repack to q6_Kx8 implementation Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> * q6_K generic gemv and gemm * wip, gemm_q6_K 8x8 * Still WIP: loading of q8s, q6h and q6l * first working version of q6_K gemm * Moved q6 loads outside of sb block, Unrolled inner loop * Replaced modulo with mask * First implementation of GEMV * ggml_vdotq_s32 -> vdotq_s32 * Reduce width of accumulators in q6_K gemv * Bsums instead of calc bias. Preload scales to use vget_lane. Unroll. * Reuse scales in GEMM (same GEMV opt) * Added todos for bsum and different qh repack * Arch fallback * VSLIQ for merging qh adn ql * Removed TODO, already tested * Apply suggestions Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Removed unused import --------- Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-01-27 11:08:10 +02:00
Aleksander Grygier	b8221e8915	refactor: Utils	2026-01-27 09:04:41 +01:00
Gaurav Garg	a83c73a18a	[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full (#19042 ) * [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full With pipeline parallelism, during prompt processing, the CPU-side CUDA command buffer gets full, stalling the CPU. Due to this, enough work doesn't get submitted to the GPU, causing bubbles in the GPU timeline. Fix this by setting the CUDA environment variable CUDA_SCALE_LAUNCH_QUEUES to 4x to increase the command buffer size. * Set the env variable in the CUDA backend registry allocation * Add link to PR in code comment * Remove warning logs and update documentation	2026-01-27 08:52:44 +02:00
Daniel Bevenius	fc3cdf32ce	common : clarify HTTPS build options in error message (#19103 ) * common : clarify HTTPS build options in error message This commit updates the https error message to provide clearer instructions for users who encounter the "HTTPS is not supported" error. The motivation for this is that it might not be clear to users that only one of these options are needed to enable HTTPS support. The LLAMA_OPENSSL option is also added to the message to cover all possible build configurations. * clarify that OpenSSL is the default for HTTPS support	2026-01-27 06:16:00 +01:00
shalinib-ibm	7afdfc9b84	ggml-cpu: Enable FP16 MMA kernels on PPC (#19060 )	2026-01-27 11:52:34 +08:00

1 2 3 4 5 ...

8119 Commits All Branches Search

8119 Commits

All Branches