Francisco Herrera
8fc17493c3
gguf-split : clarify operation of gguf-split ( #19749 )
...
* clarify operation of gguf-split
so that you don't have to find out by trial and error
* formatting
2026-03-25 13:12:50 +02:00
Johannes Gäßler
36dafba5c4
llama: fix llama-model-saver ( #20503 )
...
* llama : add fd-based model loading via llama_model_load_from_fd
* llama : address review feedback for fd-based model loading
* llama : use FILE pointer instead of fd in public API
* llama : use FILE pointer consistently, address review feedback
* fixup
* fix tensor names
* fix llama-model-saver
* roundtrip tests
* fixup
* refactor tests
* fix prints
* fix model saving
* fix CI, disable Chameleon
* print seed
---------
Co-authored-by: Siddhesh2377 <siddheshsonar2377@gmail.com>
2026-03-25 12:53:16 +02:00
Aleksander Grygier
69e0ecef06
webui: Fix editing assistant message without branching ( #20944 )
...
* fix: Editing assistant response without branching
* chore: update webui build output
2026-03-25 12:47:33 +02:00
Pascal
062cca58fc
Add SLEEPING status to the WebUI model selector ( #20949 )
...
* webui: handle sleeping model status, fix favourite -> favorite
* Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* webui: fix optional event parameter in sleeping model onclick
* typo
* webui: restore orange sleeping indicator dot with hover unload
* chore: update webui build output
* webui: move stopPropagation into ActionIcon onclick, remove svelte-ignore
* chore: update webui build output
* webui: fix favourite -> favorite (UK -> US spelling) everywhere
Address review feedback from WhyNotHugo
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-25 11:02:32 +01:00
yikechayedan
406f4e3f61
android : fix-pointer-dangling ( #20974 )
2026-03-25 11:51:26 +02:00
Neo Zhang
53dc8b59bf
sycl : fix wrong variable check by assert ( #20903 )
...
* fix wrong variable check by assert
* use GGML api
2026-03-25 11:48:37 +02:00
Aleksander Grygier
94f7d16712
chore: update webui build output
2026-03-25 10:31:37 +01:00
Aleksander Grygier
7c520102ca
Merge remote-tracking branch 'ngxson/master' into allozaur/server_tools
2026-03-25 10:26:34 +01:00
Sigbjørn Skjæret
403c9c9cef
ci : bump gguf publish python version ( #20982 )
2026-03-25 11:04:59 +02:00
Sigbjørn Skjæret
8fc85db9d2
ci : limit requirements versions ( #20980 )
...
* set requests version
* limit versions outside requirements
2026-03-25 10:55:37 +02:00
Dowon
3a60d06ad9
convert : register Qwen3Model architecture ( #20967 )
2026-03-25 10:37:59 +02:00
Ravi Panchumarthy
abd86ef175
docs : Update OpenVINO backend docs ( #20968 )
...
* OpenVINO doc updates
* Update docs/backend/OPENVINO.md
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
---------
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-03-25 10:33:51 +02:00
Georgi Gerganov
9f102a1407
models : move the token embedding norms to the first layer ( #20943 )
...
* models : move the token embedding norms to the first layer
* cont : fix LLM_TENSOR_CONV1D + fix il indexing
2026-03-24 17:00:30 +02:00
Aman Gupta
3fc6f1aed1
ggml-backend: re-enable graph reuse with pipeline parallelism ( #20927 )
2026-03-24 20:47:00 +08:00
Alessandro de Oliveira Faria (A.K.A.CABELO)
29771a0a4c
vendor : update cpp-httplib to 0.39.0 ( #20933 )
2026-03-24 13:33:33 +01:00
Adrien Gallouët
42ebce3beb
common : fix get_gguf_split_info ( #20946 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 13:33:14 +01:00
BlueMöhre
a94fdb090a
WebUI: fix edit msg form textarea height ( #20830 )
...
* autoresize textarea on mount
* allow textarea to grow to same height as rendered messages
* add UI build file
2026-03-24 13:17:45 +01:00
Aleksander Grygier
79999ffd01
Merge remote-tracking branch 'ngxson/xsn/server_tools' into allozaur/server_tools
2026-03-24 11:19:57 +01:00
Adrien Gallouët
c9dc43333f
readme : clarify MODEL_ENDPOINT usage ( #20941 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 10:35:07 +01:00
Adrien Gallouët
2d2d9c2062
common : add a WARNING for HF cache migration ( #20935 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 09:24:39 +01:00
nuri
92080b4396
metal : add FLOOR, CEIL, ROUND, TRUNC unary ops ( #20930 )
...
Co-authored-by: nryoo <nryoo@nryooui-MacBookPro.local>
2026-03-24 10:13:07 +02:00
Georgi Gerganov
342d6125bc
metal : add FA instantiations for HSK=512, HSV=512 ( #20902 )
2026-03-24 10:03:09 +02:00
Aaron Teo
c2e224d829
issues: add openvino backends ( #20932 )
...
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2026-03-24 14:41:10 +08:00
Adrien Gallouët
8c7957ca33
common : add standard Hugging Face cache support ( #20775 )
...
* common : add standard Hugging Face cache support
- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check with the quant tag
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Cleanup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Improve error handling and report API errors
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Restore common_cached_model_info and align mmproj filtering
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Prefer main when getting cached ref
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use cached files when HF API fails
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use final_path..
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check all inputs
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 07:30:33 +01:00
Aman Gupta
e852eb4901
llama-fit: fix regex pattern for gate_up tensors ( #20910 )
...
* llama-fit: fix regex pattern for gate_up tensors
* Apply suggestions from code review
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-24 12:57:57 +08:00
Aldehir Rojas
312d870a89
common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss ( #20912 )
2026-03-23 22:21:47 -05:00
Max Krasnyansky
7cadbfce10
hexagon: general DMA and Binary Op fixes for large strides ( #20918 )
...
* hex-dma: make chained dma the default to handle newer models
This also includes some new instrumentation that we can remove later.
* hexagon: add uint32 dump helper
* hexagon: use single-page VTCM allocation to avoid issues with large gather ops in ssm-conv
ssm-conv uses HVX gather instruction and that instruction cannot handle cases where the base+offset
spans page boundaries.
* hexagon: update ssm-conv to make base-addr compute a bit easier to read
* hex-dma: use 1d mode for reshaping, it supports sizes up to 24-bits (>16MB)
* hex-bin: fix incorrect stride logic
* hexagon: make sure repack buffs are dumped for verbose > 2
* hex-bin: consistently use dma_queue_push even for dummy dst transactions
* hex-dma: start using 2d-wide mode on v75 and up
The removes the need to deal with the 16-bit limitaion for the strides.
* hex-bin: cleanup kernel selection logic
* hex-bin: cleanup binary op core and fix transposed tensor handling
* snapdragon: update run-bench to use larger ubatch and fa-on
2026-03-23 15:33:49 -07:00
Max Krasnyansky
1fb2290a51
Add codeowners for scripts/snapdragon and docs/snapdragon ( #20915 )
...
* Add codeowners for scripts/snapdragon
* Also add docs/backends/snapdragon
2026-03-23 14:57:18 -07:00
lhez
1772701f99
opencl: add q6_K gemm and gemv kernels for Adreno ( #20089 )
...
* opencl: add q6_K noshuffle kernels, initial q6_K gemv, some host code
* opencl: add q6_K transpose
* opencl: fix cvt kernel name
* opencl: add call to q6_K gemv
* opencl: fix q6_K scale transpose
* opencl: fix loading for gemv q6_K, refactor
* opencl: fix transpose_8_buf kernel assignment, refactor
* opencl: refactor q6_K transpose
* opencl: add gemm_noshuffle_q6_k_f32
* opencl: fix qh loading
* opencl: refactor q6_K gemv host side, release bufs and imgs
* opencl: refactor
* opencl: fix q6_K dequant and scale selection
* opencl: workaround compiler bug, fix dump_tensor
* opencl: refactor q6_K convert kernels
* opencl: unpack transformed q6_K in get_tensor
* opencl: refactor, handle non-uniform workgroups
* opencl: support non-vector subgroup bcast
2026-03-23 12:44:18 -07:00
las7
39bf0d3c6a
rpc : RCE patch ( #20908 )
2026-03-23 19:54:57 +02:00
Xuan-Son Nguyen
bd6992180b
contrib: add "Requirements" section to PR template ( #20841 )
...
* contrib: add "Requirements" section to PR template
* typo [no ci]
* use h2, add "Additional information"
---------
Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-03-23 16:59:02 +01:00
Davi Henrique Linhares
fd18364755
devops: upgraded default oneAPI version ( #20731 )
2026-03-23 21:47:34 +08:00
Aleksander Grygier
11fb11b901
webui: Improve chat form positioning ( #20901 )
2026-03-23 14:30:55 +01:00
Geo Maciolek
35b662bb5d
docs: Fix typo in reasoning flag documentation ( #20780 )
...
Tested to verify - the typo is just in the docs, not the actual flag.
2026-03-23 21:24:55 +08:00
Georgi Gerganov
f93c09e267
memory : fix seq_id bounds in llama_memory_recurrent::state_read_meta() ( #20887 )
2026-03-23 14:08:46 +02:00
Xuan Son Nguyen
8098f11f8b
Merge branch 'master' into xsn/server_tools
2026-03-23 12:37:03 +01:00
Xuan Son Nguyen
e4cc43a809
llama-gen-docs
2026-03-23 12:35:19 +01:00
Eric Zhang
841bc203e2
docs : rerun llama-gen-docs to include new CLI args ( #20892 )
2026-03-23 12:33:38 +01:00
Xuan Son Nguyen
b648215eb2
add readme mention
2026-03-23 12:32:33 +01:00
Xuan Son Nguyen
7fbf86506c
Merge branch 'master' into xsn/server_tools
2026-03-23 12:24:27 +01:00
Xuan-Son Nguyen
31a5cf4c3f
server: use httplib dynamic threads ( #20817 )
...
* server: use httplib dynamic threads
* change to n_threads_http + 1024
2026-03-23 12:22:46 +01:00
Georgi Gerganov
e32d243849
ai : update gh permissions ( #20895 )
2026-03-23 13:21:41 +02:00
Aleksander Grygier
bbb2bca322
chore: update webui build output
2026-03-23 11:53:56 +01:00
Aleksander Grygier
7fc5ba3ca8
Merge remote-tracking branch 'ngxson/xsn/server_tools' into allozaur/server_tools
2026-03-23 11:44:34 +01:00
Aleksander Grygier
3994a39675
feat: UI improvements
2026-03-23 11:41:28 +01:00
Pascal
c44a932cf4
webui: fix --webui-config-file settings not applied on load ( #20823 )
...
* webui: fix --webui-config-file settings not applied on load
* chore: update webui build output
2026-03-23 11:25:35 +01:00
Rashid Ul Islam
177c75852a
metal: add CONV_3D ( #19927 )
...
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* metal:add conv_3d backend
Rebased with master and resolved conflicts.
* Resolved issues related to changes in variable names
* kernel void kernel_upscale_bilinear_f32 was missing in my branch, added back, should pass all tests now
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-23 09:45:34 +02:00
Jhen-Jie Hong
7a0b6a635e
common/autoparser : detect reasoning markers when enable_thinking changes system prompt ( #20859 )
2026-03-23 08:35:27 +01:00
Chenguang Li
07ff000551
CANN: add RoPE cache preload before ACL graph capture ( #20747 )
...
ACL graph capture disallows host-to-device memcpy and device memory
malloc/free on the captured stream. Pre-load the RoPE cache before
capture so that:
- Host-to-device copies and allocations run on the non-captured stream
- Cache metadata is populated and memory pool is warmed up
- During capture, only on-device computations are recorded; host-side
and allocation branches are skipped
2026-03-23 15:24:06 +08:00
Dan Hoffman
cc18f965b6
fix(openvino): explicit memset in buffer_context allocation ( #20857 )
...
* fix(openvino): explicit memset in buffer_context allocation
* minor
---------
Co-authored-by: Dan Hoffman <dhoffman@cyket.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-23 08:05:37 +02:00