Aaron Teo
264f1b5187
zdnn: refactor codebase + add docs ( #16178 )
...
* zdnn: initial matmul refactor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: rm static from funcs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: update ggml-zdnn.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: change header files to hpp
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: switch to common.hpp
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: move mulmat forward around
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: rm inline from utils
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: add zDNN docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-09-23 14:53:05 +08:00
Daniel Bevenius
0bc7cc7154
codeowners : add @danbev to model-conversion example [no ci] ( #16190 )
...
This commit adds examples/model-conversion/ to the CODEOWNERS file and
assigns myself (@danbev) as the code owner for this directory.
2025-09-23 09:13:22 +03:00
Aaron Teo
4b9f4cb0f8
devops: add s390x containers ( #15915 )
...
* devops: add s390x dockerfile
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add missing ninja
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: move s390x docker into cpu docker
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: rework s390x docker
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: copy more tools
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add server build step
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: remove apt clean steps as distroless misses it
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: remove apt commands from distroless
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix shared libs in distroless
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: use correct libs path
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix shared libs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add collector stage
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix missing stage ref
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix permission issue
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix unknown model loading failures
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: attempt at fixing model loading failure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix missing ggml shared object
failure to load model
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: remove move shared objects
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: move libggml-cpu and blas into bin
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: finalise hardened server stage
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add cli target
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix typos
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix missing shared libraries in base
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: update debian target
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: formalise llama.cpp loc
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "devops: formalise llama.cpp loc"
This reverts commit 0a7664af84 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: formalise llama.cpp loc
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 0a7664af84 )
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: attempt at fixing missing dir
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: attempt at making it cache the build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix copying process
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: make build dir an argument
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "devops: make build dir an argument"
This reverts commit 438698976b .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add build stage for gguf-py
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: move gguf-py installation into build stage
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: break system packages?
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add rust compiler installer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix rustc not found
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: remove cache mount to allow rustc to persist
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: move rustc installation to another layer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: move gguf-py installation to full stage, fix copying
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: remove rustc installation in build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: disable full target for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: attempting static build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: merge s390x dockerfile into cpu for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: switch to gcc image for build step
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: remove build essentials
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: install openblas into base target
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: go back to s390x dockerfile
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: remove libggml and libblas
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add full target
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add break system packages
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add libjpeg
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add missing cmake dep
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: finalise docker images for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add custom openblas patch
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: use libopenblas-dev instead of libopenblas-openmp-dev
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add s390x docker build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-09-23 13:59:34 +08:00
Daniel Bevenius
85e72271ba
ggml-cpu : fix typo in gemm comments [no ci] ( #16189 )
2025-09-23 05:59:03 +02:00
Ed Addario
8eedcf74bc
Increase scale multiplier
2025-09-22 20:42:37 +01:00
Ed Addario
d36ee0a0a8
Add comments to explain magic numbers
2025-09-22 20:41:56 +01:00
Ed Addario
7ba6001ec8
Simplify candidates sorting
2025-09-22 20:11:54 +01:00
Ed Addario
d79ade2e8e
Adjust for small vector size
2025-09-22 20:11:26 +01:00
Ed Addario
f184450806
Fix minor logic flaw
2025-09-22 20:10:42 +01:00
Ed Addario
1fbc59f867
Replace slope with cross product
2025-09-22 20:10:10 +01:00
Ed Addario
c855094dff
Exit loop if no better solution found
2025-09-22 20:09:11 +01:00
Gabe Goodhart
1d0125bcf1
feat: Add conversion support in GraniteHybrid for non-hybrid (all attn) ( #16177 )
...
This is a configuration of the hparams in the GraniteHybrid architecture
that devolves to the Granite (or GraniteMoe) architecture (ie Granite 3.x).
It may be used for some models in the Granite 4 family with the
GraniteHybrid architecture acting as a superset arch. Rather than support
it directly in the c++ graph, we simply coerce the architecture flag back
to the correct "granite" or "granitemoe" architecture.
Branch: gabe-l-hart/GraniteNonHybridConversion
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-22 20:40:10 +02:00
Haiyue Wang
351f3da39c
clang-tidy : disable warning about performance enum size ( #16127 )
...
Disable 'performance-enum-size' checking:
Enum 'llama_token_type' uses a larger base type ('unsigned int', size: 4 bytes)
than necessary for its value set, consider using 'std::uint8_t' (1 byte) as the
base type to reduce its size.
2025-09-22 19:57:46 +02:00
Sigbjørn Skjæret
3ecb2f671a
ggml : implement set_rows with i32 index ( #16159 )
...
* implement set_rows with i32 index
* template fix
* test quantized path
warnings--
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* forgotten name change
* deduplicate cuda/sycl and test-fix
* indent++
* vulkan: support set_rows with i32 index type (#16162 )
* disable i32 index for webgpu for now
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-09-22 19:13:00 +02:00
Georgi Gerganov
432cf4304c
codeowners : update + cleanup ( #16174 )
...
---------
Co-authored-by: slaren <slarengh@gmail.com>
2025-09-22 18:20:21 +03:00
Adrien Gallouët
37a23c17bd
common : enable `--offline` mode without curl support ( #16137 )
...
* common : use the json parser
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* common : enable --offline mode without CURL support
This change refactors the download logic to properly support offline mode
even when the project is built without CURL.
Without this commit, using `--offline` would give the following error:
error: built without CURL, cannot download model from the internet
even if all the files are already cached.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-22 15:13:51 +03:00
Quentin Bramas
138c87ce8b
webui : fix handling incomplete chunks ( #16107 )
2025-09-22 11:53:13 +03:00
GideonSerf
c6db9a1027
embedding : fix typos in README ( #16171 )
2025-09-22 11:49:58 +03:00
Haiyue Wang
d05affbab7
common : remove unused local variables ( #16140 )
...
These two local variables 'arg' and 'arg_prefix' have been overriden by:
1. for (const auto & arg : opt.args)
2. for (int i = 1; i < argc; i++) {
const std::string arg_prefix = "--";
std::string arg = argv[i];
2025-09-22 11:48:42 +03:00
Georgi Gerganov
4f324a556c
ggml : extend ggml_can_fuse to work with non-sequential nodes ( #16123 )
...
* ggml : extend ggml_can_fuse to work with non-sequential nodes in the graph
* cont : fix wrong bounds check condition
* cont : remove unnecessary overload
2025-09-22 11:12:37 +03:00
Georgi Gerganov
a71ae3ba7a
ggml : add ggml_op_is_empty ( #16122 )
...
* ggml : add ggml_op_is_empty
* ggml : move to ggml-impl.h
2025-09-22 11:12:09 +03:00
Xuan-Son Nguyen
05a2458121
codeowners : update ownership for @ngxson and @allozuar ( #16128 )
2025-09-22 11:10:58 +03:00
Shin-myoung-serp
96fdca043b
Vulkan: add conv_transpose_2d operation ( #16022 )
...
* Vulkan: add conv_transpose_2d operation
* Vulkan: fix typo in conv_transpose_2d shader(s0mp, s0L, s1mp, s1L)
* Vulkan: fix incorrect indentation in conv_transpose_2d shader
* Vulkan: add checking the push constants size limit and reuse conv2d_mm.comp for conv_transpose_2d operation
* Vulkan: revert the order of the index calculation and bound check in conv_2d shader
* Vulkan: explicity check push constants limit in supports_op() for conv_transpose_2d operation.
* Vulkan: remove unnecessary lower bound checks for H/W_idx in the conv_2d shader.
2025-09-22 10:04:01 +02:00
Sigbjørn Skjæret
b2d980fce0
codeowners : claim responsibility for ci, models, gguf-py and convert ( #16124 )
...
* claim responsibility for ci, gguf-py and convert
* add myself to various src/llama- files
2025-09-22 10:59:05 +03:00
Georgi Gerganov
5c6106a696
contrib : update roles ( #16113 )
...
* contrib : update roles
* contrib : merge PR sections + add link to CI instructions
Updated pull request guidelines for contributors and collaborators, and clarified merging practices for maintainers.
2025-09-22 10:58:02 +03:00
Georgi Gerganov
ec65fb52f0
ci : remove vulkaninfo calls ( #16169 )
2025-09-22 10:16:05 +03:00
Georgi Gerganov
1d660d2fae
ci : use smaller model ( #16168 )
...
* ci : switch from gemma to qwen3 0.6b
* ci : use smaller model for some tests
2025-09-22 09:11:39 +03:00
Jeff Bolz
a20d810d79
vulkan: add RTE variants of exp shader ( #16165 )
...
This fixes some failures on Turing where "round to zero" rounds to the max f16
value but the CPU reference value is infinite.
2025-09-22 07:37:17 +02:00
Georgi Gerganov
4d0a7cbc61
ci : adjust params for less runtime ( #16167 )
...
* ci : adjust params for less runtime
* ci : gate BF16 on some hardware
* ci : move extra tests to Arm runner
2025-09-22 08:31:40 +03:00
Ruben Ortlam
9073a73d82
vulkan: vec dot matrix multiplication fix ( #16151 )
...
* vulkan: fix matrix multiplication index calculation for odd m/n and odd k in combination with batching
* add odd m/n + odd k test with batching
2025-09-22 07:22:43 +02:00
lhez
51f5a45fbe
opencl: fix concat crash on win arm64 with Adreno ( #15944 )
2025-09-21 16:42:10 -07:00
lhez
c4510dc937
opencl: initial `q8_0` mv support ( #15732 )
2025-09-21 14:48:44 -07:00
Ed Addario
b748a1efa7
Fix typo
2025-09-21 22:03:54 +01:00
Ed Addario
896cdc2121
Refactor potential overflow
2025-09-21 22:03:36 +01:00
Ed Addario
fecc472c61
Fix typos in variable names
2025-09-21 17:26:38 +01:00
Ed Addario
e92db008bc
Refactor quantisation checks into its own function
2025-09-21 17:20:48 +01:00
Georgi Gerganov
da30ab5f86
ci : add label for the RISC-V runner ( #16150 )
2025-09-21 19:00:27 +03:00
Ed Addario
814f6b66be
Minor general refactoring
2025-09-21 16:45:09 +01:00
Ed Addario
0d5f18303e
Refactor lagrange_penalty()
2025-09-21 16:22:00 +01:00
Ed Addario
9a1656eb97
Refactor pareto optimise and convexify
2025-09-21 16:21:35 +01:00
Ed Addario
1a3e9ea4c8
Refactor estimate_error()
2025-09-21 16:21:00 +01:00
Ed Addario
a7ee915e19
Refactor trimmed_sum()
2025-09-21 16:20:06 +01:00
Ed Addario
b09662f86a
Refactor estimate_lambda()
2025-09-21 16:19:49 +01:00
Ed Addario
17be7615ce
Refactor candidate types build
2025-09-21 16:19:28 +01:00
Ed Addario
08146fd67f
Refactor side_data() and copy_or_broadcast()
2025-09-21 16:19:03 +01:00
Ed Addario
7386d4eadd
Refactor row sampling
2025-09-21 16:18:26 +01:00
Ed Addario
b6c008fd8a
Refactor helper lambdas
2025-09-21 16:04:13 +01:00
Georgi Gerganov
28baac9c9f
ci : migrate ggml ci to self-hosted runners ( #16116 )
...
* ci : migrate ggml ci to a self-hosted runners
* ci : add T4 runner
* ci : add instructions for adding self-hosted runners
* ci : disable test-backend-ops from debug builds due to slowness
* ci : add AMD V710 runner (vulkan)
* cont : add ROCM workflow
* ci : switch to qwen3 0.6b model
* cont : fix the context size
2025-09-21 16:50:45 +03:00
Ed Addario
b433fd9547
Refactor last budget pass
2025-09-21 13:43:09 +01:00
Ed Addario
c466c53808
Refactor pareto pruning and convexification
2025-09-21 13:42:54 +01:00