Commit Graph

8119 Commits

Author SHA1 Message Date
Daniel Bevenius eef375ce16
sampling : remove sampling branching in output_reserve (#18811)
* sampling : remove sampling branching in output_reserve

This commit updates output_reserve in llama-context.cpp to always
allocate sampling buffers regardless of whether sampling is needed for
the current batch.

The motivation for this is to avoid reallocations and branching based on
the sampling requirements of the batch.
2026-01-28 05:59:30 +01:00
Nikhil Jain 06961e2876
ggml webgpu: Split shared state (webgpu_context) into global state and per-thread state (#18976)
* Squashed commit of the following:

commit b3c6bf4b0450d8d452b934df27a0fb7cb53cd755
Author: Abhijit Ramesh <abhijitramesh2k@gmail.com>
Date:   Mon Dec 1 18:29:00 2025 -0800

    ggml webgpu: fix xielu parameter passing (#11)

    The XIELU operation was incorrectly using static_cast to convert
    float parameters to uint32_t, which converted numeric values instead
    of preserving IEEE 754 bit patterns. This caused incorrect values
    to be interpreted by the GPU shader.

    * Use reinterpret_cast to preserve float bit patterns when passing
      through uint32_t params buffer
    * Update WGSL shader parameter types from u32 to f32
    * Re-enable XIELU support (was disabled due to numerical issues)

    Fixes NMSE test failures for XIELU operation on WebGPU backend.

commit 5ca9b5e49e
Author: neha-ha <137219201+neha-ha@users.noreply.github.com>
Date:   Tue Nov 18 12:17:00 2025 -0800

    Refactored pipelines and workgroup calculations (#10)

    * refactored pipelines

    * refactored workgroup calculation

    * removed commented out block of prior maps

    * Clean up ceiling division pattern

    ---------

    Co-authored-by: Neha Abbas <nehaabbas@eduroam-169-233-141-223.ucsc.edu>
    Co-authored-by: Reese Levine <reeselevine1@gmail.com>

Author: James Contini <jamescontini@gmail.com>
Date:   Wed Oct 29 23:13:06 2025 -0700

    formatted embed wgsl and ggml-webgpu.cpp

commit e1f6baea31
Author: James Contini <jamescontini@gmail.com>
Date:   Wed Oct 29 23:08:37 2025 -0700

    implemented REPL_Template support and removed bug in unary operators kernel

commit 8c70b8fece
Author: James Contini <jamescontini@gmail.com>
Date:   Wed Oct 15 16:14:20 2025 -0700

    responded and dealt with PR comments

commit f9282c660c
Author: James Contini <jamescontini@gmail.com>
Date:   Sun Oct 12 13:41:41 2025 -0700

    removed unnecesarry checking if node->src[1] exists for unary operators

commit 4cf28d7dec
Author: James Contini <jamescontini@gmail.com>
Date:   Sun Oct 12 13:32:45 2025 -0700

    All operators (inlcluding xielu) working

commit 74c6add176
Author: James Contini <jamescontini@gmail.com>
Date:   Fri Oct 10 13:16:48 2025 -0700

    fixed autoconfig

commit 362749910b
Author: James Contini <jamescontini@gmail.com>
Date:   Fri Oct 10 13:10:46 2025 -0700

    removed vestigial files

commit cb08583337
Author: James Contini <jamescontini@gmail.com>
Date:   Fri Oct 10 12:59:32 2025 -0700

    abides by editor-config

commit 5360e2852a
Author: James Contini <jamescontini@gmail.com>
Date:   Fri Oct 10 12:45:57 2025 -0700

    rms_norm double declaration bug atoned

commit 7b09baa4aa
Merge: 8a6ec843 74b8fc17
Author: James Contini <jamescontini@gmail.com>
Date:   Fri Oct 10 11:50:03 2025 -0700

    resolving merge conflicts

commit 8a6ec843a5
Author: James Contini <jamescontini@gmail.com>
Date:   Wed Oct 8 18:06:47 2025 -0700

    unary operators pass ggml tests

commit c3ae38278a
Author: James Contini <jamescontini@gmail.com>
Date:   Wed Oct 1 16:22:40 2025 -0700

    neg passes backend test

commit aa1c9b2f88
Author: James Contini <jamescontini@gmail.com>
Date:   Tue Sep 30 23:55:27 2025 -0700

    neg f16xf32xip builds and runs, havent actually ran a model that uses neg kernel yet though

Co-authored-by: James Contini <jamescontini@gmail.com>
Co-authored-by: Neha Abbas <neabbas@ucsc.edu>
Co-authored-by: Abhijit Ramesh <abhijitramesh2k@gmail.com>

* Remove extra code and format

* Add ops documentation (finally)

* ggml webgpu: add SOFTPLUS unary operator

Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32
precision for intermediate calculations to prevent f16 overflow.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
* Follow Vulkan backend numerical stability pattern

* ggml webgpu: add EXPM1 unary operator

Implements EXPM1 (exp(x) - 1) with f16/f32 support.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support

* ggml webgpu: add FLOOR unary operator

Implements FLOOR (rounds down to nearest integer) with f16/f32 support.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support

* ggml webgpu: add CEIL unary operator

Implements CEIL (rounds up to nearest integer) with f16/f32 support.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support

* ggml webgpu: add ROUND unary operator

Implements ROUND (rounds to nearest integer) with f16/f32 support.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support

* ggml webgpu: add TRUNC unary operator

Implements TRUNC (truncates towards zero) with f16/f32 support.

* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support

* docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS)

* Updates to webgpu get_memory

* Move shared state (webgpu_context) and device creation out of registration context, device context, and buffer context, and move into backend context

* Small cleanup

* Move Instance, Device, Adapter, Device creation, and capabilities to global state while moving Queue, pipelines, and buffers to per-thread state.

* Cleanups

* More cleanup

* Move staging_buf mutex to global context

* Resolve merge

* Resolve merge

* Resolve merge

* Clean up merge errors, delete forward declaration, and run clang-format

* Rename device_init to backend_init

* Move webgpu_context to backend_context

* Move buffer context members into global context and refactor function calls

* Run clang-format

* Remove commends

* Move parameter buffers to per-thread, add single memset_tensor param buf

* Fix CI compilation issue

* Fix builds for emscripten not supporting subgroups

* cleanup

* cleanup

---------

Co-authored-by: Reese Levine <reeselevine1@gmail.com>
2026-01-27 20:53:36 -08:00
Vishal Singh f2571df8b7
ggml-zendnn : update ZenDNN git tag to main branch (#19133) 2026-01-28 06:21:36 +08:00
Sigbjørn Skjæret 2b4cbd2834
jinja : implement mixed type object keys (#18955)
* implement mixed type object keys

* add tests

* refactor

* minor fixes

* massive refactor

* add more tests

* forgotten tuples

* fix array/object is_hashable

* correct (albeit broken) jinja responses

verified with transformers

* improved hashing and equality

* refactor hash function

* more exhausive test case

* clean up

* cont

* cont (2)

* missing cstring

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2026-01-27 19:50:42 +01:00
Aleksander Grygier bdae58ceb8 refactor: Reuse MCP connections for health checks 2026-01-27 17:13:09 +01:00
Aleksander Grygier 0779dff7ca chore: update webui build output 2026-01-27 17:03:59 +01:00
Aleksander Grygier fcb7d1f899 fix: Sync streaming content to active messages 2026-01-27 16:46:19 +01:00
Aleksander Grygier aff13cc085 refactor: Go back to simpler Stores + Services architecture 2026-01-27 15:57:12 +01:00
Aleksander Grygier f7b7ae467e feat: Introduce BaseClient for common store integration
refactor(agentic-client): Extend BaseClient for store integration
refactor(chat-client): Extend BaseClient for store integration
refactor(conversations-client): Extend BaseClient for store integration
2026-01-27 15:27:30 +01:00
Aleksander Grygier ace0de145a feat: Introduce centralized API fetch utilities
refactor(models): Use new API fetch utilities
refactor(props): Use new API fetch utilities
2026-01-27 15:27:29 +01:00
Aleksander Grygier 948278d663 fix: Missing tool call handling 2026-01-27 15:11:06 +01:00
Aleksander Grygier f40b377e34 refactor: Improves abort signal handling 2026-01-27 14:55:35 +01:00
David Lima 68ac3acb43
docs: Remove duplicated word on CUDA build section (#19136) 2026-01-27 14:48:51 +01:00
Aleksander Grygier 55e73cdde8 chore: update webui build output 2026-01-27 14:29:20 +01:00
Johannes Gäßler a5bb8ba4c5
CUDA: tune GLM 4.7 Flash FA kernel selection logic (#19097) 2026-01-27 14:28:56 +01:00
Aleksander Grygier 7ba1b458d5 refactor: Create shared ActiveConversationStore to avoid circular dependency between ChatStore and ConversationsStore 2026-01-27 14:27:13 +01:00
Aleksander Grygier 9cce846f32 chore: update webui build output 2026-01-27 14:01:34 +01:00
Aleksander Grygier 6e7b3385a2 feat: Enhance ChatMessageMcpPromptContent display 2026-01-27 13:47:18 +01:00
Aleksander Grygier 8219404122 feat: Disable server card toggle when in error state 2026-01-27 13:47:18 +01:00
Aleksander Grygier 738ccd8a52 feat: Add auto-resizing textarea to KeyValuePairs component 2026-01-27 13:47:18 +01:00
Aleksander Grygier f09eeed040 chore: update webui build output 2026-01-27 13:13:56 +01:00
Aleksander Grygier 70f96c96b6 refactor: Remove unused `getChatActionsContext` import 2026-01-27 13:10:24 +01:00
Aleksander Grygier d43895d706 feat: Implement inactive chat conversation state cleanup 2026-01-27 13:10:24 +01:00
Aleksander Grygier 2281ac50c6 refactor: Use TTL cache for model properties in ModelsStore 2026-01-27 13:10:24 +01:00
Aleksander Grygier 2e2cb3d210 feat: Implement generic TTL cache utility 2026-01-27 13:10:24 +01:00
Aleksander Grygier 80ab2a5d1f feat: Add cache configuration constants 2026-01-27 13:10:24 +01:00
Aleksander Grygier 8421d056be chore: update webui build output 2026-01-27 13:01:12 +01:00
Aleksander Grygier 25df25a126 refactor: Adapt message child components to MessageEditContext 2026-01-27 13:00:37 +01:00
Aleksander Grygier 93992b10a7 refactor: Encapsulate message editing state and actions in ChatMessage.svelte 2026-01-27 13:00:37 +01:00
Aleksander Grygier cbcd7956c8 refactor: Centralize chat-wide actions in ChatMessages.svelte 2026-01-27 13:00:36 +01:00
Aleksander Grygier 6b6ebd6bca feat: Introduce Chat Actions and Message Edit Contexts 2026-01-27 13:00:36 +01:00
Aleksander Grygier 357fd8d591 chore: update webui build output 2026-01-27 12:23:47 +01:00
Aleksander Grygier 6cf823fb92 refactor: Components 2026-01-27 12:20:16 +01:00
Aleksander Grygier 8a8cd78237 refactor: Improve styling and overflow handling for ChatMessageMcpPromptContent 2026-01-27 11:56:55 +01:00
Aleksander Grygier 8ca3ffa076 feat: Add support for pasting MCP prompt attachments in ChatForm 2026-01-27 11:56:55 +01:00
Aleksander Grygier 770f993086 feat: Implement clipboard serialization/deserialization for MCP prompts 2026-01-27 11:56:55 +01:00
Aleksander Grygier 99d177d442 feat: Introduce clipboard types for MCP prompt attachments 2026-01-27 11:56:55 +01:00
Sigbjørn Skjæret c0204a0893
ci : revert slim runner for winget (#19129) 2026-01-27 11:54:25 +01:00
Aleksander Grygier 69682dcb1a fix: Edit Mode with MCP Prompt in message 2026-01-27 11:30:44 +01:00
Aleksander Grygier f22e2be4d0 refactor: Use Popover for Chat Form Prompt Picker 2026-01-27 11:22:30 +01:00
Aleksander Grygier 7eff7a31de feat: UI improvements 2026-01-27 11:07:20 +01:00
Aleksander Grygier d4a6815ea9 chore: update webui build output 2026-01-27 10:40:34 +01:00
Aleksander Grygier b834f165a4 Merge remote-tracking branch 'origin/allozaur/mcp-mvp' into allozaur/mcp-mvp 2026-01-27 10:40:11 +01:00
Aleksander Grygier e35adedb4f chore: update webui build output 2026-01-27 10:27:40 +01:00
Aleksander Grygier 1b7f576baf refactor: Components 2026-01-27 10:26:14 +01:00
Alberto Cabrera Pérez be8890e721
ggml-cpu: aarm64: q6_K repack gemm and gemv (and generic) implementations (i8mm) #18860 (#18888)
* Boilerplate for q6_K repack

* q6_K repack to q6_Kx8 implementation

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>

* q6_K generic gemv and gemm

* wip, gemm_q6_K 8x8

* Still WIP: loading of q8s, q6h and q6l

* first working version of q6_K gemm

* Moved q6 loads outside of sb block, Unrolled inner loop

* Replaced modulo with mask

* First implementation of GEMV

* ggml_vdotq_s32 -> vdotq_s32

* Reduce width of accumulators in q6_K gemv

* Bsums instead of calc bias. Preload scales to use vget_lane. Unroll.

* Reuse scales in GEMM (same GEMV opt)

* Added todos for bsum and different qh repack

* Arch fallback

* VSLIQ for merging qh adn ql

* Removed TODO, already tested

* Apply suggestions

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Removed unused import

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@liquid.ai>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-27 11:08:10 +02:00
Aleksander Grygier b8221e8915 refactor: Utils 2026-01-27 09:04:41 +01:00
Gaurav Garg a83c73a18a
[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full (#19042)
* [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full

With pipeline parallelism, during prompt processing, the CPU-side CUDA command buffer gets full, stalling the CPU. Due to this, enough work doesn't get submitted to the GPU, causing bubbles in the GPU timeline.
Fix this by setting the CUDA environment variable CUDA_SCALE_LAUNCH_QUEUES to 4x to increase the command buffer size.

* Set the env variable in the CUDA backend registry allocation

* Add link to PR in code comment

* Remove warning logs and update documentation
2026-01-27 08:52:44 +02:00
Daniel Bevenius fc3cdf32ce
common : clarify HTTPS build options in error message (#19103)
* common : clarify HTTPS build options in error message

This commit updates the https error message to provide clearer
instructions for users who encounter the "HTTPS is not supported" error.

The motivation for this is that it might not be clear to users that only
one of these options are needed to enable HTTPS support.
The LLAMA_OPENSSL option is also added to the message to cover all
possible build configurations.

* clarify that OpenSSL is the default for HTTPS support
2026-01-27 06:16:00 +01:00
shalinib-ibm 7afdfc9b84
ggml-cpu: Enable FP16 MMA kernels on PPC (#19060) 2026-01-27 11:52:34 +08:00