Xuejun Zhai
37f6bca87b
Merge branch 'dev_backend_openvino' into xuejun/ov-bk-add-func-is-splited-model
2026-03-18 13:21:51 +08:00
Mustafa Cavus
c397b1cfac
Thread safety per request only
2026-03-18 11:20:11 +08:00
Xuejun Zhai
fbc3128c17
Infer and propagate dynamic-dimension indices for all tensors in the GGML graph in api compute_model_outputs()
2026-03-18 11:10:29 +08:00
Neo Zhang
b6c83aad55
[SYCL] ehance UPSCALE to support all UT cases ( #20637 )
...
* [SYCL] ehance UPSCALE to support more cases
* rm test case result of SYCL1
2026-03-17 10:01:52 +08:00
Piotr Wilkin (ilintar)
2e4a6edd4a
tools/server: support refusal content for Responses API ( #20285 )
...
* Support refusal content for Responses API
* Update tools/server/server-common.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tools/server/server-common.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 01:42:04 +01:00
Xuan-Son Nguyen
d34ff7eb5b
model: mistral small 4 support ( #20649 )
...
* model: mistral small 4 support
* fix test
* fix test (2)
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* change newline
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-17 00:31:14 +01:00
Georgi Gerganov
45172df4d6
ci : disable AMX jobs ( #20654 )
...
[no ci]
2026-03-16 22:38:59 +02:00
Georgi Gerganov
9b342d0a9f
benches : add Nemotron 3 Nano on DGX Spark ( #20652 )
...
[no ci]
2026-03-16 21:50:43 +02:00
Sigbjørn Skjæret
55e87026f7
tests : write to binary buffer to avoid newline translation in jinja -py [no ci] ( #20365 )
2026-03-16 20:40:22 +01:00
Martin Klacer
cf21cdf36c
kleidiai: add data type check to get_tensor_traits ( #20639 )
...
* kleidiai: add data type check to get_tensor_traits
* Added check for F16 data type into get_tensor_traits path with input data
not in ggml_backend_cpu_kleidiai_buffer_type format (unsupported for Q4/8)
Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Change-Id: I9aca4b9b8d669d35db6f1dbcc4e080b1919b1de7
* updated ggml/src/ggml-cpu/kleidiai/kleidiai.cpp
updated kleidiai.cpp file as per suggestion
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: Martin Klacer <martin.klacer@arm.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-16 21:25:54 +02:00
Sigbjørn Skjæret
0ed992973b
ci : update labeler ( #20629 )
2026-03-16 20:24:20 +01:00
Aldehir Rojas
1bbec6a75d
jinja : add capability check for object args ( #20612 )
2026-03-16 17:43:14 +01:00
Georgi Gerganov
f47a246a08
sync : ggml
2026-03-16 17:22:06 +02:00
Georgi Gerganov
c0ccbd1f86
ggml : try fix arm build (whisper/0)
2026-03-16 17:22:06 +02:00
David366AI
f6da02c3f2
ggml : extend im2col f16 (ggml/1434)
...
* examples/yolo: fix load_model memory leak
* fix/issue-1433 ggml_compute_forward_im2col_f16 assert error
* fix/issue-1433
2026-03-16 17:22:06 +02:00
Pascal
dddca026bf
webui: add model information dialog to router mode ( #20600 )
...
* webui: add model information dialog to router mode
* webui: add "Available models" section header in model list
* webui: remove nested scrollbar from chat template in model info dialog
* chore: update webui build output
* feat: UI improvements
* refactor: Cleaner rendering + UI docs
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-16 15:38:11 +01:00
Aman Gupta
3c8521c4f5
llama-graph: replace cont with reshape for alpha in qwen35 ( #20640 )
2026-03-16 22:07:13 +08:00
Aleksander Grygier
67a2209fab
webui: Add MCP CORS Proxy detection logic & UI ( #20167 )
...
* refactor: MCP store cleanup
* feat: Add MCP proxy availability detection
* fix: Sidebar icon
* chore: update webui build output
* chore: Formatting
* chore: update webui build output
* chore: Update package lock
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
2026-03-16 13:05:36 +01:00
Pascal
d65c4f2dc9
Fix model selector locked to first loaded model with multiple models ( #20580 )
...
* webui: fix model selector being locked to first loaded model
When multiple models are loaded, the auto-select effect would re-fire
on every loadedModelIds change, overriding the user's manual model
selection. Guard with selectedModelId so auto-select only kicks in
when no model is chosen yet.
* chore: update webui build output
2026-03-16 12:04:06 +01:00
Woof Dog
d8c331c0af
webui: use date in more human readable exported filename ( #19939 )
...
* webui: use date in exported filename
Move conversation naming and export to utils
update index.html.gz
* webui: move literals to message export constants file
* webui: move export naming and download back to the conversation store
* chore: update webui build output
* webui: add comments to some constants
* chore: update webui build output
2026-03-16 11:18:13 +01:00
Ruben Ortlam
46dba9fce8
vulkan: fix flash attention dot product precision ( #20589 )
2026-03-16 10:45:49 +01:00
Sigbjørn Skjæret
de8f01c2d7
model : wire up Nemotron-H tensors for NVFP4 support ( #20561 )
...
* wire up Nemotron-H tensors for NVFP4 support
* add ssm tensors
* alignment
2026-03-16 09:19:16 +01:00
Richard Davison
079e5a45f0
convert : support mixed-precision ModelOpt models with per-tensor NVFP4/FP8 quantization ( #20539 )
...
* support mixed-precision ModelOpt models with per-tensor NVFP4/FP8 quantization
* cleanup
* fallback
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-16 09:18:47 +01:00
Masato Nakasaka
d3936498a3
common : fix iterator::end() dereference ( #20445 )
2026-03-16 08:50:38 +02:00
Xuejun Zhai
a528765b7d
Add fun description
2026-03-15 23:31:27 -07:00
Xuejun Zhai
eb5dc53a82
Fix error in test ops
2026-03-15 22:08:44 -07:00
Aman Gupta
34818ea6c0
CUDA: GDN hide memory latency ( #20537 )
2026-03-16 11:41:45 +08:00
Xuejun Zhai
813fe5f638
Add member func named is_splited_model()
2026-03-15 19:07:05 -07:00
Xuejun Zhai
2b6d2daa6a
Add interface is_model_splitted() to check the c-graph is splited or not
2026-03-15 19:00:13 -07:00
Piotr Wilkin (ilintar)
9e2e2198b0
tools/cli: fix disable reasoning ( #20606 )
2026-03-15 22:40:53 +01:00
Georgi Gerganov
88915cb55c
server : fix wait in test_cancel_requests() test ( #20601 )
...
* server : fix wait in test_cancel_requests() test
* codeowners : add team for server tests
2026-03-15 20:54:37 +02:00
Sigbjørn Skjæret
ebbf544ed1
sycl : fix for untransposed GDA recurrent state ( #20583 )
2026-03-15 19:10:15 +01:00
Sigbjørn Skjæret
b91d7dfe5b
ci : only save openvino caches on github-hosted master ( #20593 )
...
* only save openvino ccache on master
* disable toolkit cache if self-hosted
* only cache on github-hosted runners
* remove toolkit cache [no ci]
2026-03-15 18:58:13 +01:00
Johannes Gäßler
ae40cd27c8
CUDA: limit number of FA stream-k CUDA blocks ( #20586 )
2026-03-15 18:30:47 +01:00
Pascal
ceef6b5233
ggml: avoid creating CUDA context during device init ( #20595 )
2026-03-16 00:42:56 +08:00
Adrien Gallouët
07c6a59b4f
vendor : update cpp-httplib to 0.38.0 ( #20578 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-15 17:30:06 +01:00
MoonShadow
8b7d340b6f
ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain ( #20536 )
...
* ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain
On AMD APU/iGPU devices (unified memory architecture), hipMemAdviseSetCoarseGrain
returns hipErrorInvalidValue because the hint is not applicable to UMA systems.
The previous CUDA_CHECK() call treated this as a fatal error, causing crashes on
APU systems such as AMD Strix Halo (gfx1151).
Fix: treat hipMemAdviseSetCoarseGrain as an optional performance hint - call it
without error checking and clear any resulting error with hipGetLastError().
Also add pre-allocation debug logging (GGML_LOG_DEBUG) to help diagnose memory
issues on APU systems, and store totalGlobalMem in device info.
Context: AMD APUs on Windows are affected by a ROCm runtime bug that limits
hipMallocManaged to ~64GB regardless of available system RAM. A fix has been
submitted upstream: https://github.com/ROCm/rocm-systems/pull/4077
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* ggml/hip: remove unrelated changes, keep only hipMemAdviseSetCoarseGrain fix
---------
Co-authored-by: moonshadow-25 <moonshadow-25@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-15 17:23:58 +01:00
Eric Hsieh
559646472d
fix: prevent nullptr dereference ( #20552 )
2026-03-15 16:51:49 +01:00
Sigbjørn Skjæret
cf45437d35
codeowners : use teams ( #20526 )
...
* use teams
* update
* update
* update
* update
* update
2026-03-15 14:26:10 +01:00
Georgi Gerganov
9cd4ebcfb1
ci : split build.yml + server.yml ( #20546 )
...
* ci : split build.yml
* cont : split server.yml
* cont : reduce paths
* cont : split build-android.yml + update paths
* ci : make msys workflows manual (#20588 )
* ci : make cross-build workflows manual (#20585 )
* cont : fix release paths
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-15 15:11:17 +02:00
Sigbjørn Skjæret
89d0aec042
convert : support contiguous method on lora tensors ( #20489 )
2026-03-15 12:15:12 +01:00
Bartowski
b9da4444df
ggml : guard against sumq2 being 0 in IQ4_NL ( #20460 )
2026-03-15 10:47:28 +02:00
PikaPikachu
617db241aa
cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode ( #19478 )
...
* mmvq: add RDNA3/RDNA4-specific parameter table (nwarps=8, rows=1)
* mmvq: add dedicated RDNA3 parameter table
* mmvq: exclude RDNA3.5 (gfx1150/1151) from RDNA3 table
2026-03-15 08:33:39 +01:00
Ruben Ortlam
1a3d8edbba
vulkan: use graphics queue on AMD ( #20551 )
...
* vulkan: use graphics queue on AMD for slightly better performance
* disable async transfer queue on AMD
2026-03-15 08:18:54 +01:00
sprayandwipe
6b10a82c00
kv-cache : fix reading llama_kv_cell_ext during state read ( #20273 )
...
Co-authored-by: sid <sid@ragingfist.net>
2026-03-15 09:11:19 +02:00
Michael Wand
d23355afc3
model : wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support ( #20506 )
2026-03-14 22:44:42 +01:00
Georgi Gerganov
b30a5fdf37
metal : add FA specialization for HSK = 320, HSV = 256 ( #20549 )
2026-03-14 23:15:47 +02:00
Georgi Gerganov
b4768955c4
ci : move self-hosted workflows to separate files ( #20540 )
2026-03-14 23:15:35 +02:00
Gerard Guillemas Martos
fc350fdf96
docker : force Python 3.13 in Vulkan container ( #20530 )
...
* ci: force Python 3.13 in Vulkan container
* remove unnecessary `update-alternatives` line
2026-03-14 21:37:09 +01:00
Eve
3a6f059909
ci : try to optimize some jobs ( #20521 )
...
* force arm version to test
* run on either x86 or arm if we can help it, this only works for runs without ccache
* readd other jobs
* remove ccache
2026-03-14 20:27:52 +01:00