Commit Graph

8536 Commits

Author SHA1 Message Date
Adrien Gallouët 93dfbc1291
common : make LLAMA_CACHE the one cache for everything (#21009)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-26 12:04:57 +01:00
Adrien Gallouët 3cba8bba18
common : fix split model migration (#21019)
Sadly the manifest does not list all required files, i honestly thought
it was the case

Without the files listed we don't have the sha256, so if the first file
is valid, and all others have the correct size, then we can assume we
are good and do the migration...

Here my test:

    $ find /home/angt/.cache/llama.cpp
    /home/angt/.cache/llama.cpp
    /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00002-of-00002.gguf
    /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00001-of-00002.gguf
    /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00001-of-00002.gguf.etag
    /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00002-of-00002.gguf.etag
    /home/angt/.cache/llama.cpp/manifest=angt=test-split-model-stories260K=latest.json

    $ build/bin/llama-server
    ================================================================================
    WARNING: Migrating cache to HuggingFace cache directory
      Old cache: /home/angt/.cache/llama.cpp/
      New cache: /home/angt/.cache/huggingface/hub
    This one-time migration moves models previously downloaded with -hf
    from the legacy llama.cpp cache to the standard HuggingFace cache.
    Models downloaded with --model-url are not affected.
    ================================================================================
    migrate_file: migrated angt_test-split-model-stories260K_stories260K-f32-00001-of-00002.gguf -> /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00001-of-00002.gguf
    migrate_file: migrated angt_test-split-model-stories260K_stories260K-f32-00002-of-00002.gguf -> /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00002-of-00002.gguf
    migrate_old_cache_to_hf_cache: migration complete, deleting manifest: /home/angt/.cache/llama.cpp/manifest=angt=test-split-model-stories260K=latest.json

    $ find /home/angt/.cache/llama.cpp /home/angt/.cache/huggingface
    /home/angt/.cache/llama.cpp
    /home/angt/.cache/huggingface
    /home/angt/.cache/huggingface/hub
    /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K
    /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/blobs
    /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/blobs/50d019817c2626eb9e8a41f361ff5bfa538757e6f708a3076cd3356354a75694
    /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/blobs/7b273e1dbfab11dc67dce479deb5923fef27c39cbf56a20b3a928a47b77dab3c
    /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/refs
    /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/refs/main
    /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots
    /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db
    /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00002-of-00002.gguf
    /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00001-of-00002.gguf

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-26 12:04:37 +01:00
Michael Wand 112c78159f
ggml-cuda: Add NVFP4 dp4a kernel (#20644)
Added check for dst_t to cuda_cast template for float
Restored ggml_cuda_ue4m3_to_fp32, changed vecdot ints to int32ts
Added CUDART/HIP Check and HIP/fp8 include
Added NVFP4 to Test-backend-ops
Added hip_fp8_e4m3 to __nv_fp8_e4m3 typedef

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-26 09:54:03 +01:00
SamareshSingh 0fac87b157
imatrix : fix crash when using --show-statistics with zero counts (#19532)
* imatrix: fix crash when using --show-statistics with zero counts

Fixes division by zero that caused floating point exceptions when processing imatrix files with zero count values. Added checks to skip zero counts and handle empty activation vectors.

Fix for the bug #19190

* imatrix: lower log level for zero-count skip message to DBG
2026-03-26 08:14:36 +01:00
Yihao Wang 0a524f2404
CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` (#17094)
* Refactor CUDA 2D transpose implementation to support multiple kernel types and improve parameter handling

- Introduced a `conv2d_transpose_params` struct for better parameter management.
- Updated `conv2d_transpose_kernel` to be templated for different kernel types (float and half).
- Modified `ggml_cuda_conv_2d_transpose_p0` to handle both F16 and F32 kernel types.
- Enhanced test cases to validate functionality for both kernel types.

* Refactor test cases for 2D convolution transpose to support dynamic kernel types

- Updated `test_conv_transpose_2d` structure to improve parameter handling by reordering constructor arguments.
- Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations.
- Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability.

* Refactor ggml_compute_forward_conv_transpose_2d to support both F16 and F32 tensor types.

* Refactor conv2d transpose kernel to use a template for kernel type, enhancing flexibility for different data types.
Update test cases to include both F16 and F32 tensor types for comprehensive coverage.

* Update ggml/src/ggml-cuda/conv2d-transpose.cu

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

* Update ggml/src/ggml-cpu/ggml-cpu.c

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

* Refactor conv2d transpose implementation by removing the conv2d_transpose_params struct and dispatching with direct kernel launch.

* Enhance cpu conv2d transpose implementation by introducing a templated kernel type for improved flexibility with F16 and F32 data types.

---------

Co-authored-by: Aman Gupta <amangupta052@gmail.com>
2026-03-26 10:19:14 +08:00
Adrien Gallouët c0159f9c1f
common : do not delete old files from the old cache when updating (#21000)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-25 22:28:04 +01:00
Saba Fallah a970515bdb
mtmd: Add DeepSeekOCR Support (#17400)
* mtmd: llama.cpp DeepSeekOCR support
init commit

* loading sam tensors

* mtmd: fix vision model processing

* deepseek-ocr clip-vit model impl

* mtmd: add DeepSeek-OCR LM support with standard attention

* mtmd: successfully runs DeepSeek-OCR LM in llama-cli

* mtmd: Fix RoPE type for DeepSeek-OCR LM.

* loading LM
testing Vision model loading

* sam warmup working

* sam erroneous return corrected

* clip-vit:  corrected cls_embd concat

* clip-vit: model convert  qkv_proj split

* corrected combining of image encoders' results

* fix: update callback for ffn_moe_weighted and add callback for attn_out in deepseek2 model

* concat image_newline and image_seperator tokens

* visual_model warmup (technically) works

* window partitioning using standard ggml ops

* sam implementation without using CPU only ops

* clip: fixed warnings

* Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr

* mtmd: fix get_rel_pos

* mtmd: fixed the wrong scaler for get_rel_pos

* image encoding technically works but the output can't be checked singe image decoding fails

* mtmd: minor changed

* mtmd: add native resolution support

* - image encoding debugged
- issues fixed mainly related wrong config like n_patches etc.
- configs need to be corrected in the converter

* mtmd: correct token order

* - dynamic resizing
- changes are concerning PR https://github.com/sfallah/llama.cpp/pull/4

* mtmd: quick fix token order

* mtmd: fix danling pointer

* mtmd: SAM numerically works

* mtmd: debug CLIP-L (vit_pre_ln)

* mtmd: debug CLIP-L & first working DeepSeek-OCR model

* mtmd : add --dsocr-mode CLI argument for DeepSeek-OCR resolution control & all native resolution modes work

* mtmd: simplify SAM patch embedding

* mtmd: adapt Pillow image resizing function

* mtmd:  simplify DeepSeek-OCR dynamic resolution preprocessing

* mtmd: remove --dsocr-mode argument

* mtmd: refactor code & remove unused helper functions

* mtmd: fix tensor names for image newlines and view separator

* clean up

* reverting automatically removed spaces

* reverting automatically removed spaces

* mtmd: fixed bad ocr check in Deepseek2 (LM)

* mtmd: support combined QKV projection in buid_vit

* using common build_attn in sam

* corrected code-branch when flash-attn disabled
enabling usage of --flash-attn option

* mtmd: minor fix

* minor formatting and style

* fixed flake8 lint issues

* minor editorconfig-check fixes

* minor editorconfig-check fixes

* mtmd: simplify get_rel_pos

* mtmd: make sam hparams configurable

* mtmd: add detailed comments for resize_bicubic_pillow

* mtmd: fixed wrong input setting

* mtmd: convert model in FP16

* mtmd: minor fix

* mtmd: remove tweak to llama-mtmd-cli & deepseek-ocr template

* fix: test-1.jpg ORC issue with small (640) resolution
setting min-resolution base (1024) max large (1280) for dynamic-resolution

* minor: editconfig-check fix

* merge with changes from https://github.com/ggml-org/llama.cpp/pull/17909
added new opt to tests.sh to disable flash-attn

* minor: editconfig-check fix

* testing deepseek-ocr
quick and dirty test script comparing results of Qwen2.5-VL vs DeepSeek-OCR

* quick and (potential) dirty merge with https://github.com/ggml-org/llama.cpp/pull/17909

* refactoring, one single builder function and static helpers

* added deepseek-ocr test to tests.sh

* minor formatting fixes

* check with fixed expected resutls

* minor formatting

* editorconfig-check fix

* merge with changes from https://github.com/ggml-org/llama.cpp/pull/18042

* minor
- added GLM-4.6V to big tests
- added missing deps for python test

* convert: minor fix

* mtmd: format code

* convert: quick fix

* convert: quick fix

* minor python formatting

* fixed merge build issue

* merge resolved
- fixed issues in convert
- tested several deepseek models

* minor fix

* minor

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* - removed clip_is_deepseekocr
- removed redundant RESIZE_ALGO_BICUBIC_PILLOW resize-algo
- simplified image-preprocessing
- removed/simplified debug functions

* - cleaning commented out code

* fixing instabilities issues reintroducing resize_bicubic_pillow

* - use f16 model for deepseek-ocr test
- ignore llama-arch test for deepseek-ocr

* rename fc_w --> mm_fc_w

* add links to OCR discussion

* cleaner loading code

* add missing .weight to some tensors

* add default jinja template (to be used by server)

* move test model to ggml-org

* rolling back upscale change

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: bluebread <hotbread70127@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-03-25 19:57:40 +01:00
Adrien Gallouët 056b50c319
common : fix verbosity setup (#20989)
The verbosity threshold was set at the end of common_params_parse_ex(),
after doing many things (like downloading files..)

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-25 19:41:01 +01:00
Adrien Gallouët f2c72b8f1f
common : fix gguf selection in common_list_cached_models (#20996)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-25 19:18:06 +01:00
uvos ec54ac13a8
ci : fix parsing of vgpr counts in hip-quality-check (#20987)
* scripts: hip: gcn-cdna-vgpr-check: fix parsing of vgpr counts when an amdclang Remark block is interlieved with another from a different process

* Return warning ignore

* obay pep8 inline double space before inline commets

* add # noqa: NP100 for other prints too

* Add script changes to cause autotrigger
2026-03-25 19:00:37 +01:00
Saba Fallah 80322ebdaf
model: codefuse-ai/F2LLM-v2 support 2026-03-25 18:33:42 +01:00
Dowon 44c51e526b
model : allow causal_attn and pooling_type on all architectures (#20973)
* models : allow causal_attn and pooling_type on all architectures

* fix: move location
2026-03-25 18:12:38 +01:00
Aparna M P 1922f87c2f
snapdragon: add missing features to WoS scripts to achieve parity with ADB scripts (#20884)
* Add missing features to WoS scripts to achieve parity with ADB scripts

* Fix line-ending in run-mtmd.ps1

Signed-off-by: Max Krasnyansky <maxk@qti.qualcomm.com>

---------

Signed-off-by: Max Krasnyansky <maxk@qti.qualcomm.com>
Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
2026-03-25 09:43:12 -07:00
Shreya Jain 345de3cd87
Use docker in build-android.yml (#20928)
* use docker instead of SDK separately

* fix whitespaces

* Update .github/workflows/build-android.yml

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-25 09:36:27 -07:00
Aman Gupta 9c600bcd4b
llama-bench: print `-n-cpu-moe` when offloaded layers > 1 (#20984) 2026-03-25 21:17:27 +08:00
Masato Nakasaka b2704f9028
ci: Allow ninja to be used during unit test (#20742)
* Remove make dependency

* Added option to specify Ninja generator

* use ninja-build as default for several CI

* Revert "use ninja-build as default for several CI"

This reverts commit f552c4559b.

* changed use plain string rather than arrays

* Enabled ninja build by default for experimentation

* ci: add run.sh to test conditions to trigger GitHub CI and self-hosted runners

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* Enabled ninja build by default on self-hosted envs for experimentation

* ci: revert generator to ninja instead of ninja multi-config

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ci: install ninja-build for self-hosted workflows

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ci: revert ninja from self-hosted runners

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ci: missed one self-hosted step

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* ci: fix windows ci errors from an errenous revert

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* Added explicit build types for Ninja

Also reverted some needless change

* ci: use ninja multi-config for vulkan-x64 build

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

* added time command to measure build time

* Keeping some configs to use Ninja which show improvement

* minor fix based on review

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

* ci: rm `time` from custom containers

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-03-25 21:00:49 +08:00
Georgi Gerganov 3fab96cd04
ci : disable self-hosted mac jobs (#20985) 2026-03-25 14:46:40 +02:00
Xuan-Son Nguyen 914eb5ff0c
jinja: fix macro with kwargs (#20960)
* jinja: fix macro with kwargs

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix newline problem

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-25 12:22:48 +01:00
Francisco Herrera 8fc17493c3
gguf-split : clarify operation of gguf-split (#19749)
* clarify operation of gguf-split

so that you don't have to find out by trial and error

* formatting
2026-03-25 13:12:50 +02:00
Johannes Gäßler 36dafba5c4
llama: fix llama-model-saver (#20503)
* llama : add fd-based model loading via llama_model_load_from_fd

* llama : address review feedback for fd-based model loading

* llama : use FILE pointer instead of fd in public API

* llama : use FILE pointer consistently, address review feedback

* fixup

* fix tensor names

* fix llama-model-saver

* roundtrip tests

* fixup

* refactor tests

* fix prints

* fix model saving

* fix CI, disable Chameleon

* print seed

---------

Co-authored-by: Siddhesh2377 <siddheshsonar2377@gmail.com>
2026-03-25 12:53:16 +02:00
Aleksander Grygier 69e0ecef06
webui: Fix editing assistant message without branching (#20944)
* fix: Editing assistant response without branching

* chore: update webui build output
2026-03-25 12:47:33 +02:00
Pascal 062cca58fc
Add SLEEPING status to the WebUI model selector (#20949)
* webui: handle sleeping model status, fix favourite -> favorite

* Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/models/ModelsSelectorOption.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: fix optional event parameter in sleeping model onclick

* typo

* webui: restore orange sleeping indicator dot with hover unload

* chore: update webui build output

* webui: move stopPropagation into ActionIcon onclick, remove svelte-ignore

* chore: update webui build output

* webui: fix favourite -> favorite (UK -> US spelling) everywhere

Address review feedback from WhyNotHugo

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2026-03-25 11:02:32 +01:00
yikechayedan 406f4e3f61
android : fix-pointer-dangling (#20974) 2026-03-25 11:51:26 +02:00
Neo Zhang 53dc8b59bf
sycl : fix wrong variable check by assert (#20903)
* fix wrong variable check by assert

* use GGML api
2026-03-25 11:48:37 +02:00
Sigbjørn Skjæret 403c9c9cef
ci : bump gguf publish python version (#20982) 2026-03-25 11:04:59 +02:00
Sigbjørn Skjæret 8fc85db9d2
ci : limit requirements versions (#20980)
* set requests version

* limit versions outside requirements
2026-03-25 10:55:37 +02:00
Dowon 3a60d06ad9
convert : register Qwen3Model architecture (#20967) 2026-03-25 10:37:59 +02:00
Ravi Panchumarthy abd86ef175
docs : Update OpenVINO backend docs (#20968)
* OpenVINO doc updates

* Update docs/backend/OPENVINO.md

Co-authored-by: Aaron Teo <taronaeo@gmail.com>

---------

Co-authored-by: Aaron Teo <taronaeo@gmail.com>
2026-03-25 10:33:51 +02:00
Georgi Gerganov 9f102a1407
models : move the token embedding norms to the first layer (#20943)
* models : move the token embedding norms to the first layer

* cont : fix LLM_TENSOR_CONV1D + fix il indexing
2026-03-24 17:00:30 +02:00
Aman Gupta 3fc6f1aed1
ggml-backend: re-enable graph reuse with pipeline parallelism (#20927) 2026-03-24 20:47:00 +08:00
Alessandro de Oliveira Faria (A.K.A.CABELO) 29771a0a4c
vendor : update cpp-httplib to 0.39.0 (#20933) 2026-03-24 13:33:33 +01:00
Adrien Gallouët 42ebce3beb
common : fix get_gguf_split_info (#20946)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 13:33:14 +01:00
BlueMöhre a94fdb090a
WebUI: fix edit msg form textarea height (#20830)
* autoresize textarea on mount

* allow textarea to grow to same height as rendered messages

* add UI build file
2026-03-24 13:17:45 +01:00
Adrien Gallouët c9dc43333f
readme : clarify MODEL_ENDPOINT usage (#20941)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 10:35:07 +01:00
Adrien Gallouët 2d2d9c2062
common : add a WARNING for HF cache migration (#20935)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 09:24:39 +01:00
nuri 92080b4396
metal : add FLOOR, CEIL, ROUND, TRUNC unary ops (#20930)
Co-authored-by: nryoo <nryoo@nryooui-MacBookPro.local>
2026-03-24 10:13:07 +02:00
Georgi Gerganov 342d6125bc
metal : add FA instantiations for HSK=512, HSV=512 (#20902) 2026-03-24 10:03:09 +02:00
Aaron Teo c2e224d829
issues: add openvino backends (#20932)
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2026-03-24 14:41:10 +08:00
Adrien Gallouët 8c7957ca33
common : add standard Hugging Face cache support (#20775)
* common : add standard Hugging Face cache support

- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check with the quant tag

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Cleanup

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Improve error handling and report API errors

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Restore common_cached_model_info and align mmproj filtering

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Prefer main when getting cached ref

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use cached files when HF API fails

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Use final_path..

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Check all inputs

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-03-24 07:30:33 +01:00
Aman Gupta e852eb4901
llama-fit: fix regex pattern for gate_up tensors (#20910)
* llama-fit: fix regex pattern for gate_up tensors

* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-24 12:57:57 +08:00
Aldehir Rojas 312d870a89
common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss (#20912) 2026-03-23 22:21:47 -05:00
Max Krasnyansky 7cadbfce10
hexagon: general DMA and Binary Op fixes for large strides (#20918)
* hex-dma: make chained dma the default to handle newer models

This also includes some new instrumentation that we can remove later.

* hexagon: add uint32 dump helper

* hexagon: use single-page VTCM allocation to avoid issues with large gather ops in ssm-conv

ssm-conv uses HVX gather instruction and that instruction cannot handle cases where the base+offset
spans page boundaries.

* hexagon: update ssm-conv to make base-addr compute a bit easier to read

* hex-dma: use 1d mode for reshaping, it supports sizes up to 24-bits (>16MB)

* hex-bin: fix incorrect stride logic

* hexagon: make sure repack buffs are dumped for verbose > 2

* hex-bin: consistently use dma_queue_push even for dummy dst transactions

* hex-dma: start using 2d-wide mode on v75 and up

The removes the need to deal with the 16-bit limitaion for the strides.

* hex-bin: cleanup kernel selection logic

* hex-bin: cleanup binary op core and fix transposed tensor handling

* snapdragon: update run-bench to use larger ubatch and fa-on
2026-03-23 15:33:49 -07:00
Max Krasnyansky 1fb2290a51
Add codeowners for scripts/snapdragon and docs/snapdragon (#20915)
* Add codeowners for scripts/snapdragon

* Also add docs/backends/snapdragon
2026-03-23 14:57:18 -07:00
lhez 1772701f99
opencl: add q6_K gemm and gemv kernels for Adreno (#20089)
* opencl: add q6_K noshuffle kernels, initial q6_K gemv, some host code

* opencl: add q6_K transpose

* opencl: fix cvt kernel name

* opencl: add call to q6_K gemv

* opencl: fix q6_K scale transpose

* opencl: fix loading for gemv q6_K, refactor

* opencl: fix transpose_8_buf kernel assignment, refactor

* opencl: refactor q6_K transpose

* opencl: add gemm_noshuffle_q6_k_f32

* opencl: fix qh loading

* opencl: refactor q6_K gemv host side, release bufs and imgs

* opencl: refactor

* opencl: fix q6_K dequant and scale selection

* opencl: workaround compiler bug, fix dump_tensor

* opencl: refactor q6_K convert kernels

* opencl: unpack transformed q6_K in get_tensor

* opencl: refactor, handle non-uniform workgroups

* opencl: support non-vector subgroup bcast
2026-03-23 12:44:18 -07:00
las7 39bf0d3c6a
rpc : RCE patch (#20908) 2026-03-23 19:54:57 +02:00
Xuan-Son Nguyen bd6992180b
contrib: add "Requirements" section to PR template (#20841)
* contrib: add "Requirements" section to PR template

* typo [no ci]

* use h2, add "Additional information"

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-03-23 16:59:02 +01:00
Davi Henrique Linhares fd18364755
devops: upgraded default oneAPI version (#20731) 2026-03-23 21:47:34 +08:00
Aleksander Grygier 11fb11b901
webui: Improve chat form positioning (#20901) 2026-03-23 14:30:55 +01:00
Geo Maciolek 35b662bb5d
docs: Fix typo in reasoning flag documentation (#20780)
Tested to verify - the typo is just in the docs, not the actual flag.
2026-03-23 21:24:55 +08:00
Georgi Gerganov f93c09e267
memory : fix seq_id bounds in llama_memory_recurrent::state_read_meta() (#20887) 2026-03-23 14:08:46 +02:00