Imad Saddik
70c10f0cc1
Merge f62e0d9b4e into 88915cb55c
2026-03-15 19:25:21 +00:00
Imad Saddik
f62e0d9b4e
chore: update webui build output
2026-03-15 19:25:09 +00:00
Imad Saddik
9e5d40a550
chore: set autoChatWidth to true by default
2026-03-15 19:18:21 +00:00
Georgi Gerganov
88915cb55c
server : fix wait in test_cancel_requests() test ( #20601 )
...
* server : fix wait in test_cancel_requests() test
* codeowners : add team for server tests
2026-03-15 20:54:37 +02:00
Sigbjørn Skjæret
ebbf544ed1
sycl : fix for untransposed GDA recurrent state ( #20583 )
2026-03-15 19:10:15 +01:00
Imad Saddik
0dd4d9d588
chore: update webui build output
2026-03-15 18:09:22 +00:00
Imad Saddik
d549ab4893
style: remove max width 48rem from agentic-content
2026-03-15 18:07:01 +00:00
Sigbjørn Skjæret
b91d7dfe5b
ci : only save openvino caches on github-hosted master ( #20593 )
...
* only save openvino ccache on master
* disable toolkit cache if self-hosted
* only cache on github-hosted runners
* remove toolkit cache [no ci]
2026-03-15 18:58:13 +01:00
Imad Saddik
a0eccc8652
chore: update webui build output
2026-03-15 17:55:17 +00:00
Imad Saddik
333bfc7231
chore: undo changes in ChatScreenProcessingInfo
2026-03-15 17:53:31 +00:00
Imad Saddik
c6c63786c2
chore: update webui build output
2026-03-15 17:49:34 +00:00
Imad Saddik
de04a9b0e6
style: reset the width of the processing info div
2026-03-15 17:48:16 +00:00
Imad Saddik
715ba4ee85
chore: update webui build output
2026-03-15 17:40:36 +00:00
Imad Saddik
fa7d3a96c5
fix: keep the container spanning the whole width to fix scroll bar issue
2026-03-15 17:39:19 +00:00
Imad Saddik
c399ec3c46
chore: update webui build output
2026-03-15 17:36:11 +00:00
Imad Saddik
72c1928dc9
refactor: move widthClasses.class to the container div
2026-03-15 17:34:31 +00:00
Johannes Gäßler
ae40cd27c8
CUDA: limit number of FA stream-k CUDA blocks ( #20586 )
2026-03-15 18:30:47 +01:00
Imad Saddik
0cd953eea2
chore: update webui build output
2026-03-15 17:13:14 +00:00
Imad Saddik
6074619ba4
style: restore class for checkbox labels
2026-03-15 17:11:58 +00:00
Imad Saddik
297abf8450
chore: update webui build output
2026-03-15 17:09:59 +00:00
Imad Saddik
d4034eff07
fix: update chatWidthClasses to use autoChatWidth configuration
2026-03-15 17:08:43 +00:00
Imad Saddik
b73209694d
chore: update webui build output
2026-03-15 17:03:58 +00:00
Imad Saddik
2630c27754
refactor: simplify chatWidthClasses getter logic and remove widthClasses.class
2026-03-15 17:02:41 +00:00
Imad Saddik
1a6f21f25c
chore: revert package-lock.json to match master
2026-03-15 16:56:41 +00:00
Imad Saddik
95be04617e
chore: update webui build output
2026-03-15 16:56:08 +00:00
Imad Saddik
2836834801
refactor: remove anything related to the custom chat width setting
2026-03-15 16:54:43 +00:00
Imad Saddik
20a8227933
chore: update webui build output
2026-03-15 16:44:35 +00:00
Imad Saddik
89647d5daf
chore: downgrade @lucide/svelte version and remove custom chat width component
2026-03-15 16:43:18 +00:00
Pascal
ceef6b5233
ggml: avoid creating CUDA context during device init ( #20595 )
2026-03-16 00:42:56 +08:00
Adrien Gallouët
07c6a59b4f
vendor : update cpp-httplib to 0.38.0 ( #20578 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-15 17:30:06 +01:00
MoonShadow
8b7d340b6f
ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain ( #20536 )
...
* ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain
On AMD APU/iGPU devices (unified memory architecture), hipMemAdviseSetCoarseGrain
returns hipErrorInvalidValue because the hint is not applicable to UMA systems.
The previous CUDA_CHECK() call treated this as a fatal error, causing crashes on
APU systems such as AMD Strix Halo (gfx1151).
Fix: treat hipMemAdviseSetCoarseGrain as an optional performance hint - call it
without error checking and clear any resulting error with hipGetLastError().
Also add pre-allocation debug logging (GGML_LOG_DEBUG) to help diagnose memory
issues on APU systems, and store totalGlobalMem in device info.
Context: AMD APUs on Windows are affected by a ROCm runtime bug that limits
hipMallocManaged to ~64GB regardless of available system RAM. A fix has been
submitted upstream: https://github.com/ROCm/rocm-systems/pull/4077
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* ggml/hip: remove unrelated changes, keep only hipMemAdviseSetCoarseGrain fix
---------
Co-authored-by: moonshadow-25 <moonshadow-25@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 17:23:58 +01:00
Eric Hsieh
559646472d
fix: prevent nullptr dereference ( #20552 )
2026-03-15 16:51:49 +01:00
Sigbjørn Skjæret
cf45437d35
codeowners : use teams ( #20526 )
...
* use teams
* update
* update
* update
* update
* update
2026-03-15 14:26:10 +01:00
Georgi Gerganov
9cd4ebcfb1
ci : split build.yml + server.yml ( #20546 )
...
* ci : split build.yml
* cont : split server.yml
* cont : reduce paths
* cont : split build-android.yml + update paths
* ci : make msys workflows manual (#20588 )
* ci : make cross-build workflows manual (#20585 )
* cont : fix release paths
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-15 15:11:17 +02:00
Sigbjørn Skjæret
89d0aec042
convert : support contiguous method on lora tensors ( #20489 )
2026-03-15 12:15:12 +01:00
Bartowski
b9da4444df
ggml : guard against sumq2 being 0 in IQ4_NL ( #20460 )
2026-03-15 10:47:28 +02:00
PikaPikachu
617db241aa
cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode ( #19478 )
...
* mmvq: add RDNA3/RDNA4-specific parameter table (nwarps=8, rows=1)
* mmvq: add dedicated RDNA3 parameter table
* mmvq: exclude RDNA3.5 (gfx1150/1151) from RDNA3 table
2026-03-15 08:33:39 +01:00
Ruben Ortlam
1a3d8edbba
vulkan: use graphics queue on AMD ( #20551 )
...
* vulkan: use graphics queue on AMD for slightly better performance
* disable async transfer queue on AMD
2026-03-15 08:18:54 +01:00
sprayandwipe
6b10a82c00
kv-cache : fix reading llama_kv_cell_ext during state read ( #20273 )
...
Co-authored-by: sid <sid@ragingfist.net>
2026-03-15 09:11:19 +02:00
Michael Wand
d23355afc3
model : wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support ( #20506 )
2026-03-14 22:44:42 +01:00
Georgi Gerganov
b30a5fdf37
metal : add FA specialization for HSK = 320, HSV = 256 ( #20549 )
2026-03-14 23:15:47 +02:00
Georgi Gerganov
b4768955c4
ci : move self-hosted workflows to separate files ( #20540 )
2026-03-14 23:15:35 +02:00
Gerard Guillemas Martos
fc350fdf96
docker : force Python 3.13 in Vulkan container ( #20530 )
...
* ci: force Python 3.13 in Vulkan container
* remove unnecessary `update-alternatives` line
2026-03-14 21:37:09 +01:00
Eve
3a6f059909
ci : try to optimize some jobs ( #20521 )
...
* force arm version to test
* run on either x86 or arm if we can help it, this only works for runs without ccache
* readd other jobs
* remove ccache
2026-03-14 20:27:52 +01:00
Max Krasnyansky
609ea50026
hexagon: Q4_0 and MXFP4 repack fixes ( #20527 )
...
* hexagon: fix tail corruption with rows sizes not multiple of 256
* hexagon: use different stride for repacking partial blocks
* hex-mm: update repack and kernels to avoid shuffles for full 256-element blocks
Previous commit changed the repacking to use even:odd (0:1,2:3,..) packing
instead of the original (0:128,1:129,...) packing in order to fix tail corruption.
Since the mm kernels already deal with partial tails we can use even:odd
packing only for the last block.
This avoid performance penalty of having to shuffle to zip the elements
in the common case.
* hex-mm: update rmpy x8 for better optimizations
* hex-mm: tighten supported MUL_MAT checks to avoid spurios failures
* hex-mm: use vzero to init accumulators
* hex-mm: properly call partial rmpy_x8
2026-03-14 11:09:08 -07:00
Georgi Gerganov
9f774e45ee
ci : reduce webgpu tests timeout to 900s ( #20538 )
...
[no ci]
2026-03-14 17:08:26 +02:00
Xuan-Son Nguyen
94d0262277
mtmd: add llama-mtmd-debug binary ( #20508 )
...
* mtmd: add llama-mtmd-debug binary
* adapt
* fixes
* fix compile error
* fix windows compile error
* rm legacy clip_debug_encode()
* add MTMD_API to fix build
2026-03-14 15:52:29 +01:00
Neo Zhang
a93c0ef0fa
add op gated_delta_net ( #20455 )
2026-03-14 22:01:57 +08:00
Chedrian07
710878a7dd
webui: restore code preview iframe origin isolation ( #20477 )
2026-03-14 11:28:28 +01:00
Adrien Gallouët
0685848bc6
scripts : remove get-wikitext-103.sh ( #20543 )
...
It doesn't work and no one seems to use it.
$ wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip
HTTP request sent, awaiting response... 301 Moved Permanently
Location: unspecified
ERROR: Redirection (301) without location.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-14 11:22:04 +01:00