Commit Graph

8448 Commits

Author SHA1 Message Date
Imad Saddik 70c10f0cc1
Merge f62e0d9b4e into 88915cb55c 2026-03-15 19:25:21 +00:00
Imad Saddik f62e0d9b4e chore: update webui build output 2026-03-15 19:25:09 +00:00
Imad Saddik 9e5d40a550 chore: set autoChatWidth to true by default 2026-03-15 19:18:21 +00:00
Georgi Gerganov 88915cb55c
server : fix wait in test_cancel_requests() test (#20601)
* server : fix wait in test_cancel_requests() test

* codeowners : add team for server tests
2026-03-15 20:54:37 +02:00
Sigbjørn Skjæret ebbf544ed1
sycl : fix for untransposed GDA recurrent state (#20583) 2026-03-15 19:10:15 +01:00
Imad Saddik 0dd4d9d588 chore: update webui build output 2026-03-15 18:09:22 +00:00
Imad Saddik d549ab4893 style: remove max width 48rem from agentic-content 2026-03-15 18:07:01 +00:00
Sigbjørn Skjæret b91d7dfe5b
ci : only save openvino caches on github-hosted master (#20593)
* only save openvino ccache on master

* disable toolkit cache if self-hosted

* only cache on github-hosted runners

* remove toolkit cache [no ci]
2026-03-15 18:58:13 +01:00
Imad Saddik a0eccc8652 chore: update webui build output 2026-03-15 17:55:17 +00:00
Imad Saddik 333bfc7231 chore: undo changes in ChatScreenProcessingInfo 2026-03-15 17:53:31 +00:00
Imad Saddik c6c63786c2 chore: update webui build output 2026-03-15 17:49:34 +00:00
Imad Saddik de04a9b0e6 style: reset the width of the processing info div 2026-03-15 17:48:16 +00:00
Imad Saddik 715ba4ee85 chore: update webui build output 2026-03-15 17:40:36 +00:00
Imad Saddik fa7d3a96c5 fix: keep the container spanning the whole width to fix scroll bar issue 2026-03-15 17:39:19 +00:00
Imad Saddik c399ec3c46 chore: update webui build output 2026-03-15 17:36:11 +00:00
Imad Saddik 72c1928dc9 refactor: move widthClasses.class to the container div 2026-03-15 17:34:31 +00:00
Johannes Gäßler ae40cd27c8
CUDA: limit number of FA stream-k CUDA blocks (#20586) 2026-03-15 18:30:47 +01:00
Imad Saddik 0cd953eea2 chore: update webui build output 2026-03-15 17:13:14 +00:00
Imad Saddik 6074619ba4 style: restore class for checkbox labels 2026-03-15 17:11:58 +00:00
Imad Saddik 297abf8450 chore: update webui build output 2026-03-15 17:09:59 +00:00
Imad Saddik d4034eff07 fix: update chatWidthClasses to use autoChatWidth configuration 2026-03-15 17:08:43 +00:00
Imad Saddik b73209694d chore: update webui build output 2026-03-15 17:03:58 +00:00
Imad Saddik 2630c27754 refactor: simplify chatWidthClasses getter logic and remove widthClasses.class 2026-03-15 17:02:41 +00:00
Imad Saddik 1a6f21f25c chore: revert package-lock.json to match master 2026-03-15 16:56:41 +00:00
Imad Saddik 95be04617e chore: update webui build output 2026-03-15 16:56:08 +00:00
Imad Saddik 2836834801 refactor: remove anything related to the custom chat width setting 2026-03-15 16:54:43 +00:00
Imad Saddik 20a8227933 chore: update webui build output 2026-03-15 16:44:35 +00:00
Imad Saddik 89647d5daf chore: downgrade @lucide/svelte version and remove custom chat width component 2026-03-15 16:43:18 +00:00
Pascal ceef6b5233
ggml: avoid creating CUDA context during device init (#20595) 2026-03-16 00:42:56 +08:00
Adrien Gallouët 07c6a59b4f
vendor : update cpp-httplib to 0.38.0 (#20578)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-15 17:30:06 +01:00
MoonShadow 8b7d340b6f
ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain (#20536)
* ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain

On AMD APU/iGPU devices (unified memory architecture), hipMemAdviseSetCoarseGrain
returns hipErrorInvalidValue because the hint is not applicable to UMA systems.
The previous CUDA_CHECK() call treated this as a fatal error, causing crashes on
APU systems such as AMD Strix Halo (gfx1151).

Fix: treat hipMemAdviseSetCoarseGrain as an optional performance hint - call it
without error checking and clear any resulting error with hipGetLastError().

Also add pre-allocation debug logging (GGML_LOG_DEBUG) to help diagnose memory
issues on APU systems, and store totalGlobalMem in device info.

Context: AMD APUs on Windows are affected by a ROCm runtime bug that limits
hipMallocManaged to ~64GB regardless of available system RAM. A fix has been
submitted upstream: https://github.com/ROCm/rocm-systems/pull/4077

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ggml/hip: remove unrelated changes, keep only hipMemAdviseSetCoarseGrain fix

---------

Co-authored-by: moonshadow-25 <moonshadow-25@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 17:23:58 +01:00
Eric Hsieh 559646472d
fix: prevent nullptr dereference (#20552) 2026-03-15 16:51:49 +01:00
Sigbjørn Skjæret cf45437d35
codeowners : use teams (#20526)
* use teams

* update

* update

* update

* update

* update
2026-03-15 14:26:10 +01:00
Georgi Gerganov 9cd4ebcfb1
ci : split build.yml + server.yml (#20546)
* ci : split build.yml

* cont : split server.yml

* cont : reduce paths

* cont : split build-android.yml + update paths

* ci : make msys workflows manual (#20588)

* ci : make cross-build workflows manual (#20585)

* cont : fix release paths

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-15 15:11:17 +02:00
Sigbjørn Skjæret 89d0aec042
convert : support contiguous method on lora tensors (#20489) 2026-03-15 12:15:12 +01:00
Bartowski b9da4444df
ggml : guard against sumq2 being 0 in IQ4_NL (#20460) 2026-03-15 10:47:28 +02:00
PikaPikachu 617db241aa
cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode (#19478)
* mmvq: add RDNA3/RDNA4-specific parameter table (nwarps=8, rows=1)

* mmvq: add dedicated RDNA3 parameter table

* mmvq: exclude RDNA3.5 (gfx1150/1151) from RDNA3 table
2026-03-15 08:33:39 +01:00
Ruben Ortlam 1a3d8edbba
vulkan: use graphics queue on AMD (#20551)
* vulkan: use graphics queue on AMD for slightly better performance

* disable async transfer queue on AMD
2026-03-15 08:18:54 +01:00
sprayandwipe 6b10a82c00
kv-cache : fix reading llama_kv_cell_ext during state read (#20273)
Co-authored-by: sid <sid@ragingfist.net>
2026-03-15 09:11:19 +02:00
Michael Wand d23355afc3
model : wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support (#20506) 2026-03-14 22:44:42 +01:00
Georgi Gerganov b30a5fdf37
metal : add FA specialization for HSK = 320, HSV = 256 (#20549) 2026-03-14 23:15:47 +02:00
Georgi Gerganov b4768955c4
ci : move self-hosted workflows to separate files (#20540) 2026-03-14 23:15:35 +02:00
Gerard Guillemas Martos fc350fdf96
docker : force Python 3.13 in Vulkan container (#20530)
* ci: force Python 3.13 in Vulkan container

* remove unnecessary `update-alternatives` line
2026-03-14 21:37:09 +01:00
Eve 3a6f059909
ci : try to optimize some jobs (#20521)
* force arm version to test

* run on either x86 or arm if we can help it, this only works for runs without ccache

* readd other jobs

* remove ccache
2026-03-14 20:27:52 +01:00
Max Krasnyansky 609ea50026
hexagon: Q4_0 and MXFP4 repack fixes (#20527)
* hexagon: fix tail corruption with rows sizes not multiple of 256

* hexagon: use different stride for repacking partial blocks

* hex-mm: update repack and kernels to avoid shuffles for full 256-element blocks

Previous commit changed the repacking to use even:odd (0:1,2:3,..) packing
instead of the original (0:128,1:129,...) packing in order to fix tail corruption.
Since the mm kernels already deal with partial tails we can use even:odd
packing only for the last block.
This avoid performance penalty of having to shuffle to zip the elements
in the common case.

* hex-mm: update rmpy x8 for better optimizations

* hex-mm: tighten supported MUL_MAT checks to avoid spurios failures

* hex-mm: use vzero to init accumulators

* hex-mm: properly call partial rmpy_x8
2026-03-14 11:09:08 -07:00
Georgi Gerganov 9f774e45ee
ci : reduce webgpu tests timeout to 900s (#20538)
[no ci]
2026-03-14 17:08:26 +02:00
Xuan-Son Nguyen 94d0262277
mtmd: add llama-mtmd-debug binary (#20508)
* mtmd: add llama-mtmd-debug binary

* adapt

* fixes

* fix compile error

* fix windows compile error

* rm legacy clip_debug_encode()

* add MTMD_API to fix build
2026-03-14 15:52:29 +01:00
Neo Zhang a93c0ef0fa
add op gated_delta_net (#20455) 2026-03-14 22:01:57 +08:00
Chedrian07 710878a7dd
webui: restore code preview iframe origin isolation (#20477) 2026-03-14 11:28:28 +01:00
Adrien Gallouët 0685848bc6
scripts : remove get-wikitext-103.sh (#20543)
It doesn't work and no one seems to use it.

    $ wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip
    HTTP request sent, awaiting response... 301 Moved Permanently
    Location: unspecified
    ERROR: Redirection (301) without location.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-14 11:22:04 +01:00