Commit Graph

6291 Commits

Author SHA1 Message Date
Ed Addario 61c0e01f50
Execute bpw_overrides() only if an imatrix file is provided 2025-08-24 13:36:03 +01:00
Ed Addario 3856d60328
Restrict quant types per family 2025-08-23 14:45:07 +01:00
Ed Addario decafae270
Adjust bias_lambda 2025-08-23 11:30:11 +01:00
Ed Addario 68ae5e66ce
Improve list of candidate types 2025-08-23 02:50:55 +01:00
Ed Addario 73124a9921
Refactor estimate_error() 2025-08-23 02:17:22 +01:00
Ed Addario f75265f55b
Fix typo 2025-08-23 01:08:37 +01:00
Ed Addario 9a4b115497
Explicitly adding <atomic> include 2025-08-23 01:08:01 +01:00
Ed Addario 6d17889add
Log if override is from tensor-type or from bpw-target 2025-08-22 16:58:46 +01:00
Ed Addario fea99d051a
Refactor and combine lambdas 2025-08-22 16:57:58 +01:00
Ed Addario f05c8483d8
Improve dequantized_buffer fill 2025-08-22 09:17:58 +01:00
Ed Addario 897decbe8a
Show skipped IQ tensors 2025-08-22 09:15:11 +01:00
Ed Addario 01c927fb94
Improve pareto efficient candidate selection 2025-08-22 09:14:14 +01:00
Ed Addario 47cdbe2155
Reduce sampling window to speedup process 2025-08-22 09:11:11 +01:00
Ed Addario 2f13fee795
Parameterise type 2025-08-22 09:05:55 +01:00
Ed Addario bb0d912c1f
Update comments 2025-08-22 09:02:56 +01:00
Ed Addario 35c1504441
Fix byte count for 3d or higher tensors 2025-08-22 09:01:57 +01:00
Ed Addario ec0afbe79f
Include embeddings and output tensors 2025-08-22 01:46:09 +01:00
Ed Addario e6eefa68f1
Merge branch 'master' into quantize 2025-08-21 19:22:24 +01:00
Ed Addario 5b6f1e9fde
General code refactor 2025-08-21 19:18:54 +01:00
Georgi Gerganov cd36b5e5c7
llama : remove deprecated llama_kv_self API (#15472)
ggml-ci
2025-08-21 19:13:45 +03:00
Georgi Gerganov 3f196be84b
graph : remove build_attn_with_sinks overload (#15469)
ggml-ci
2025-08-21 18:44:45 +03:00
Ed Addario 9e11f82e8f
Precompute error denominator in estimate_erro() 2025-08-21 16:25:31 +01:00
Acly 97ae5961a4
vulkan : support conv_2d_dw with f16 weights (#15392) 2025-08-21 17:01:51 +02:00
Dong Won Kim 20c2dac8c6
vulkan: add exp operation (#15456)
Co-authored-by: aeseulgi <kim2h7903@gmail.com>
2025-08-21 17:00:16 +02:00
Jeff Bolz 96452a3fa4
vulkan: Reuse conversion results in prealloc_y (#15410)
* vulkan: Reuse conversion results in prealloc_y

Cache the pipeline and tensor that were most recently used to fill prealloc_y,
and skip the conversion if the current pipeline/tensor match.

* don't use shared pointer for prealloc_y_last_pipeline_used
2025-08-21 16:55:00 +02:00
Jie Fu (傅杰) 9ad5e60dba
examples : fix some typos in examples/model-conversion/README.md (#15477)
Signed-off-by: Jie Fu <jiefu@tencent.com>
2025-08-21 16:53:13 +02:00
Ed Addario 887490c5ec
Dequantise sampled rows only 2025-08-21 15:11:49 +01:00
Georgi Gerganov 715a6db02c
kv-cache : drop the "unified" prefix (#15467)
* kv-cache : drop the "unified" prefix

ggml-ci

* cont : fix comment [no ci]
2025-08-21 17:00:33 +03:00
Jie Fu (傅杰) ad294df03f
examples : install torch-cpu for model conversion tool/example (#15475)
Signed-off-by: Jie Fu <jiefu@tencent.com>
2025-08-21 15:42:34 +02:00
Ali Tariq 029bb39eb1
ci : enable RVV1.0 native build (#15386)
* Changed the CI file to hw

* Changed the CI file to hw

* Added to sudoers for apt

* Removed the clone command and used checkout

* Added libcurl

* Added gcc-14

* Checking gcc --version

* added gcc-14 symlink

* added CC and C++ variables

* Added the gguf weight

* Changed the weights path

* Added system specification

* Removed white spaces

* ci: Replace Jenkins riscv native build Cloud-V pipeline with GitHub Actions workflow

Removed the legacy .devops/cloud-v-pipeline Jenkins CI configuration and introduced .github/workflows/build-riscv-native.yml for native RISC-V builds using GitHub Actions.

* removed trailing whitespaces

* Added the trigger at PR creation

* Corrected OS name

* Added ccache as setup package

* Added ccache for self-hosted runner

* Added directory for ccache size storage

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Changed the build command and added ccache debug log

* Added the base dir for the ccache

* Re-trigger CI

* Cleanup and refactored ccache steps

* Cleanup and refactored ccache steps

---------

Co-authored-by: Akif Ejaz <akifejaz40@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-08-21 14:52:16 +02:00
Ed Addario e01dad886b
Parallelise candidate evaluation 2025-08-21 12:47:13 +01:00
Georgi Gerganov 30649cab65
ci : continue file download with wget (#15471)
ggml-ci
2025-08-21 13:42:55 +03:00
Daniel Bevenius 2758fa10da
examples : add model conversion tool/example (#15455)
* examples : add model conversion tool/example

This commit adds an "example/tool" that is intended to help in the
process of converting models to GGUF. Currently it supports normal
causal models and embedding models. The readme contains instructions and
command to guide through the process.

The motivation for this to have a structured and repeatable process for
model conversions and hopefully with time improve upon it to make the
process easier and more reliable. We have started to use this for new
model conversions internally and will continue doing so and improve it
as we go along. Perhaps with time this should be placed in a different
directory than the examples directory, but for now it seems like a good
place to keep it while we are still developing it.

* squash! examples : add model conversion tool/example

Remove dependency on scikit-learn in model conversion example.

* squash! examples : add model conversion tool/example

Update transformer dep to use non-dev version. And also import
`AutoModelForCausalLM` instead of `AutoModel` to ensure compatibility
with the latest version.

* squash! examples : add model conversion tool/example

Remove the logits requirements file from the all requirements file.
2025-08-21 12:16:54 +02:00
Michael Giba b108e42904
ci : fix -Werror=return-type in clip.cpp so ci/run.sh can run without issue (#15221)
* Fix -Werror=return-type so ci/run.sh can run

* Update tools/mtmd/clip.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Remove false now that we have abort

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-08-21 12:06:46 +02:00
Copilot 245be739df
ci : add copilot-instructions.md (#15286)
* Initial plan

* Initialize copilot instructions exploration

* Add comprehensive .github/copilot-instructions.md file

* Update Python environment and tools directory documentation

- Add instructions for using .venv Python environment
- Include flake8 and pyright linting tools from virtual environment
- Add tools/ as core directory in project layout
- Reference existing configuration files (.flake8, pyrightconfig.json)

* add more python dependencies to .venv

* Update copilot instructions: add backend hardware note and server testing

* Apply suggestions from code review

* Apply suggestions from code review

* Replace clang-format with git clang-format to format only changed code

* Minor formatting improvements: remove extra blank line and add trailing newline

* try installing git-clang-format

* try just clang-format

* Remove --binary flag from git clang-format and add git-clang-format installation to CI

* download 18.x release

* typo--

* remove --binary flag

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-08-21 11:47:52 +02:00
Ed Addario 95b2ab2800
Change error estimate to use normalised weighted MSE 2025-08-21 10:46:37 +01:00
Julien Denize b2caf67db1
convert : make Mistral community chat templates optional via parameter (#15420)
* Make Mistral community chat templates optional

* Change the flag arg to disable instead of enable community chat templates

* Improve error message

* Improve help message

* Tone down the logger messages
2025-08-21 11:19:50 +02:00
Jie Fu (傅杰) 2f3dbffb17
common : fix incorrect print of non-ascii characters in the logging (#15466)
Signed-off-by: Jie Fu <jiefu@tencent.com>
2025-08-21 11:54:34 +03:00
Ed Addario 5ef493ea1a
Exclude embeddings and output tensor 2025-08-21 09:48:29 +01:00
Xuan-Son Nguyen 945e1f12a6
ggml : fix condition of im2col on Metal backend (#15460) 2025-08-21 08:32:26 +03:00
stduhpf 1b0db8f6e0
server : fix webui (#15462)
* Fix webui crash after streaming

* build webui
2025-08-21 08:19:22 +03:00
Daniel Bevenius 29f538ac63
examples : remove references to `make` in examples [no ci] (#15457)
This commit removes references to `make` in the examples, as the build
system has been updated to use CMake directly and using `make` will now
generate an error since Commit 37f10f955f
("make : remove make in favor of CMake (#15449)").
2025-08-21 06:12:28 +02:00
R0CKSTAR 8ad038c0fd
musa: add GGML_UNUSED_VARS (#15446)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-08-21 11:06:05 +08:00
Diego Devesa 5682a3745f
sched : copy only the used experts when offloading prompt processing (#15346) 2025-08-21 01:35:28 +02:00
Ed Addario 35ad0fc4ad
Improve error estimation using weighted MSE 2025-08-20 23:27:20 +01:00
teo 1bc664a26a
server: fix OpenAI API compatibility for usage statistics in chat streams (#15444) 2025-08-21 00:10:08 +02:00
Johannes Gäßler 13aeb7aef2
CUDA: refactor FA support/selection code (#15454) 2025-08-20 23:14:14 +02:00
Ed Addario b0b33b7ccb
Optimise tensor sampling 2025-08-20 20:58:26 +01:00
Ed Addario 3f0118d602
Fix bias lambda bug 2025-08-20 17:26:37 +01:00
Ed Addario 52da4a4f8c
Skip if output.weight or type is COPY 2025-08-20 17:26:05 +01:00