Commit Graph

7999 Commits

Author SHA1 Message Date
Yuri Khrustalev c053e18a66
chat: Add LFM2 tool handling (#16763)
* Add LFM2 tool handling

* fmt

* Apply suggestion from @ykhrustalev
2025-10-27 23:54:01 +01:00
Xuan-Son Nguyen e1ab084803
mtmd : fix idefics3 preprocessing (#16806)
* mtmd : fix idefics3 preprocessing

* disable granite test

* fix test for granite
2025-10-27 23:12:16 +01:00
bssrdf 6d12288037 WIP: fixed a bug in cpy transpos index computation 2025-10-27 17:32:03 -04:00
Diego Devesa 5a4ff43e7d
llama : disable pipeline parallelism if compute buffer allocation fails (#16748) 2025-10-27 21:51:28 +01:00
Acly 10640e31aa
ggml : fix interpolate with align-corners and ne=1 (#16700)
* ggml : fix interpolate with align-corners and ne=1

* avoid division by zero if one of the spatial dimensions is 1
* cpu, cuda, opencl returned correct result anyway due to clamp
* vulkan didn't clamp for align-corners so results were broken

* fix clang warning
2025-10-27 21:50:22 +01:00
Johannes Gäßler 80d28f104c
HIP: fix AMDGPU_TARGETS, update documentation (#16803) 2025-10-27 21:39:49 +01:00
bssrdf a3784e17ad WIP: debugging cpy transpose 2025-10-27 15:09:03 -04:00
bssrdf cc327f5224 added a specialization for cuda copy op when tensor is transposed 2025-10-27 11:23:27 -04:00
Xuan-Son Nguyen c55d53acec
model : add LightOnOCR-1B model (#16764)
* model : add LightOnOCR-1B model

* add test
2025-10-27 16:02:58 +01:00
bssrdf 30990788e8 WIP 2025-10-27 08:29:20 -04:00
Johannes Gäßler 945501f5ea
llama: fix leaked buffers for mmap + split files (#16765) 2025-10-27 09:17:31 +01:00
Aman Gupta 75cbdd3fce
test-backend-ops: print failed tests at the end (#16785) 2025-10-27 09:25:10 +08:00
tamarPal 2b9bd9bf4e
sycl: add ROLL operation support (#16665)
* sycl: add ROLL operation support

- Implement ggml_sycl_roll function for F32 tensors
- Add multi-axis roll operation with SYCL kernel
- Support all 4 tensor dimensions with proper shift normalization
- Add roll.cpp and roll.hpp to SYCL backend
- Update backend dispatch and supports_op for GGML_OP_ROLL
- Tests: 17662/17662 pass with identical CPU reference results

* fix: remove trailing whitespace from roll.cpp

- Fix EditorConfig violations in ggml/src/ggml-sycl/roll.cpp
- Remove trailing spaces from lines 6, 11, 28, 47, 58, 60

* ci: retrigger

* sycl: remove wait() calls from ROLL operation

* fix: editorconfig — LF endings + final newline for roll.hpp

---------

Co-authored-by: tamarPal <tamarPal@example.com>
2025-10-27 09:20:24 +08:00
shani-f 59fc1ec8e8
sycl: add REPEAT_BACK operation support (#16734)
* SYCL repeat_back v1 — add core op + switch case

* Implement repeat_back SYCL operation and minor fixes

* Update ggml/src/ggml-sycl/repeat_back.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update ggml/src/ggml-sycl/repeat_back.hpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-27 09:19:50 +08:00
Aman Gupta 75d33b9302
CUDA: support for weight clamp in top-k norm (#16702) 2025-10-27 09:06:16 +08:00
Acly 3470a5c891
ggml-alloc : make gallocr prefer chunks that allow memory reuse (#16788) 2025-10-26 23:19:03 +01:00
Sigbjørn Skjæret bd562fe4f7
cuda : use fast copy when src and dst are of different type and contiguous (#16789)
* use fast copy when src and dst are contiguous and same shape

* use int64_t ne and ignore shape
2025-10-26 21:31:41 +01:00
leejet bbac6a26b2
ggml: fix cuda kernel launch configuration for k_compute_batched_ptrs to support large batch (#16744)
* fix k_compute_batched_ptrs

* add backend ops test

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* reduce the batch size

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-10-26 19:13:31 +01:00
Sigbjørn Skjæret 73a48c9790
convert : enable expert group selection for all models with it (#16691) 2025-10-26 17:21:23 +01:00
Sigbjørn Skjæret f696428ce8
graph : add clamping to ffn_moe_weights_sum to avoid div-by-zero (#16655)
* add missing norm topk bias

* use clamping instead, update number and add comment
2025-10-26 17:20:32 +01:00
Sigbjørn Skjæret 7cce4f8158
model : set res->t_embd in SmallThinker models (#16782) 2025-10-26 16:08:52 +01:00
amirai21 8d8862829c
docs : add Jamba to Text-only models list (#16778) 2025-10-26 13:01:20 +01:00
Aman Gupta f77c13b91f
CUDA: General GEMV fusion (#16715) 2025-10-26 19:28:04 +08:00
Gilad S. 3cfa9c3f12
vulkan: deduplicate Microsoft Direct3D12 devices (#16689)
* fix: deduplicate and deprioritize Microsoft Direct3D12 vulkan devices from the `vulkan-dozen` driver

* style: indent

* fix: decrease priority

* fix: switch to `||`
2025-10-26 05:37:38 +01:00
bssrdf c68fe36ae2 WIP: cleanup; enhanced test case 2025-10-25 21:57:39 -04:00
bssrdf 475f9879c5 WIP: fixed another bug 2025-10-25 20:24:14 -04:00
bssrdf 396f55831c WIP: bug fix 2025-10-25 18:14:12 -04:00
Galunid 5d195f17bc
convert : handle mmproj filename/path properly (#16760)
* convert: handle mmproj model output filename properly

* remove redundant commits

* Add model_type to gguf utility

* Use mmproj- prefix instead of suffix

* Apply CISC suggestion

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-25 20:41:36 +02:00
bssrdf 610e41ae2d still debugging 2025-10-25 11:10:39 -04:00
Shunta Saito 226f295f4d
model : set res->t_embd in PLaMo2 models (#16766) 2025-10-25 12:26:27 +02:00
Giuseppe Scrivano f90b4a8efe
vulkan: delete dead code (#16732)
ggml_vk_create_buffer_temp is not used anywhere, and it is the only
caller for ggml_vk_pool_malloc.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2025-10-25 10:59:54 +02:00
Jeff Bolz 8423d01931
vulkan: Optimize SSM_SCAN (#16645) 2025-10-25 07:04:12 +02:00
bssrdf c45df12ee7 this case is broken; to be debugged 2025-10-24 22:40:34 -04:00
bssrdf 980ddc1e87 properly use __CUDA_ARCH__ to protect the tensor path 2025-10-24 21:56:58 -04:00
compilade 5cca2542ac
convert : avoid dequantizing mxfp4 for GPT-OSS (#16756) 2025-10-24 20:52:00 -04:00
bssrdf 24b553204b WIP: fixed another bug 2025-10-24 16:53:40 -04:00
leejet 55945d2ef5
ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (#16742)
* Fix CUDA grid launch condition for large block_nums.y

* add backend ops test

* reduce test  repetitions
2025-10-24 21:39:37 +02:00
bssrdf 6c90c20cb1 WIP: bug fix 2025-10-24 15:33:57 -04:00
bssrdf be25be8ed3 WIP: debugging tensor core kernel 2025-10-24 14:24:26 -04:00
bssrdf 80a996cfc0 WIP: tensore code compiled ok 2025-10-24 11:41:11 -04:00
Aman Gupta 0bcb40b48c
CUDA: use CUB for arbitary size argsort (#16754) 2025-10-24 20:46:19 +08:00
Florian Badie 69e9ff0103
webui: support q URL parameter (#16728)
* webui: support q URL parameter

Fixes #16722
I’ve checked that it works with Firefox’s AI tools

* webui: apply suggestions from code review

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: update webui static build

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-10-24 14:10:29 +02:00
Daniel Bevenius 5a91109a5d
model-conversion : add trust_remote_code for orig model run [no ci] (#16751)
This commit add the trust_remote_code=True argument when loading models
using AutoConfig, AutoTokenizer, and AutoModelForCausalLM for the run
original model script.

The motivation for this is that some models require custom code to be
loaded properly, and setting trust_remote_code=True avoids a prompt
asking for user confirmation:
```console
(venv) $ make causal-run-original-model
The repository /path/to/model contains custom code which must be
executed to correctly load the model. You can inspect the repository
content at /path/to/model.

Do you wish to run the custom code? [y/N] N
```

Having this as the default seems like a safe choice as we have to clone
or download the models we convert and would be expecting to run any
custom code they have.
2025-10-24 12:02:02 +02:00
bssrdf 2715341c1d WIP: output 2025-10-23 21:29:45 -04:00
compilade f8f071fadd
convert : handle pre-quantized models (#14810)
* convert : begin handling pre-quantized models

* convert : fix conversion from FP8 for Deepseek-V3.1-Base
2025-10-23 16:31:41 -04:00
Johannes Gäßler 0bf47a1dbb
server: add memory breakdown print (#16740) 2025-10-23 21:30:17 +02:00
bssrdf 66f6d16265 WIP 2025-10-23 13:52:26 -04:00
Julien Denize dd62dcfab9
convert : Make mistral-common dependency optional (#16738)
* Make mistral-common dependency optional

* Fix typing
2025-10-23 15:54:46 +02:00
Xuan-Son Nguyen d0660f237a
mtmd-cli : allow using --jinja (#16718)
* mtmd-cli : allow using --jinja

* support -sys

* implement chat_history

* fix clear memory

* rm -sys support, added TODO
2025-10-23 15:00:49 +02:00
Prajwal B Mehendarkar fe6a9882ac
Manually link -lbsd to resolve flock symbol on AIX (#16610) 2025-10-23 19:37:31 +08:00