Commit Graph

325 Commits

Author SHA1 Message Date
Diego Devesa 6adc3c3ebc
llama : add thread safety test (#14035)
* llama : add thread safety test

* llamafile : remove global state

* llama : better LLAMA_SPLIT_MODE_NONE logic

when main_gpu < 0 GPU devices are not used

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-16 08:11:43 -07:00
bandoti 0dbcabde8c
cmake: clean up external project logic for vulkan-shaders-gen (#14179)
* Remove install step for vulkan-shaders-gen

* Add install step to normalize msvc with make

* Regenerate modified shaders at build-time
2025-06-16 10:32:13 -03:00
Jeff Bolz 652b70e667
vulkan: force device 0 in CI (#14106) 2025-06-10 10:53:47 -05:00
Diego Devesa 7f4fbe5183
llama : allow building all tests on windows when not using shared libs (#13980)
* llama : allow building all tests on windows when not using shared libraries

* add static windows build to ci

* tests : enable debug logs for test-chat

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-06-09 20:03:09 +02:00
Yuanhao Ji 056eb74534
CANN: Enable labeler for Ascend NPU (#13914) 2025-06-09 11:20:06 +08:00
吴小白 5787b5da57
ci: add LoongArch cross-compile build (#13944) 2025-06-07 10:39:11 -03:00
Diego Devesa 2589ad3704
ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997) 2025-06-04 15:37:40 +02:00
Diego Devesa 482548716f
releases : use dl backend for linux release, remove arm64 linux release (#13996) 2025-06-04 13:15:54 +02:00
bandoti d98f2a35fc
ci: disable LLAMA_CURL for Linux cross-builds (#13871) 2025-05-28 15:46:47 -03:00
Diego Devesa a2d02d5793
releases : bundle llvm omp library in windows release (#13763) 2025-05-25 00:55:16 +02:00
Diego Devesa 17fc817b58
releases : enable openmp in windows cpu backend build (#13756) 2025-05-24 22:27:03 +02:00
Diego Devesa b775345d78
ci : enable winget package updates (#13734) 2025-05-23 23:14:00 +03:00
Diego Devesa a70a8a69c2
ci : add winget package updater (#13732) 2025-05-23 22:09:38 +02:00
Diego Devesa 3079e9ac8e
release : fix windows hip release (#13707)
* release : fix windows hip release

* make single hip release with multiple targets
2025-05-23 00:21:37 +02:00
Diego Devesa d643bb2c79
releases : build CPU backend separately (windows) (#13642) 2025-05-21 22:09:57 +02:00
R0CKSTAR 33983057d0
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647)
* musa: fix build warning (unused parameter)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: upgrade MUSA SDK version to rc4.0.1

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* Update ggml/src/ggml-cuda/cpy.cu

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-05-21 09:58:49 +08:00
Alberto Cabrera Pérez f71f40a284
ci : upgraded oneAPI version in SYCL workflows and dockerfile (#13532) 2025-05-19 11:46:09 +01:00
Diego Devesa 415e40a357
releases : use arm version of curl for arm releases (#13592) 2025-05-16 19:36:51 +02:00
Sigbjørn Skjæret 7c07ac244d
ci : add ppc64el to build-linux-cross (#13575) 2025-05-16 14:54:23 +02:00
Thammachart Chinvarapon b064a51a4e
ci: free_disk_space flag enabled for intel variant (#13426)
before cleanup: 20G
after cleanup: 44G
after all built and pushed: 24G

https://github.com/Thammachart/llama.cpp/actions/runs/14945093573/job/41987371245
2025-05-10 16:34:48 +02:00
Jeff Bolz dc1d2adfc0
vulkan: scalar flash attention implementation (#13324)
* vulkan: scalar flash attention implementation

* vulkan: always use fp32 for scalar flash attention

* vulkan: use vector loads in scalar flash attention shader

* vulkan: remove PV matrix, helps with register usage

* vulkan: reduce register usage in scalar FA, but perf may be slightly worse

* vulkan: load each Q value once. optimize O reduction. more tuning

* vulkan: support q4_0/q8_0 KV in scalar FA

* CI: increase timeout to accommodate newly-supported tests

* vulkan: for scalar FA, select between 1 and 8 rows

* vulkan: avoid using Float16 capability in scalar FA
2025-05-10 08:07:07 +02:00
Diego Devesa 15e03282bb
ci : limit write permission to only the release step + fixes (#13392)
* ci : limit write permission to only the release step

* fix win cuda file name

* fix license file copy on multi-config generators
2025-05-08 23:45:22 +02:00
Diego Devesa 70a6991edf
ci : move release workflow to a separate file (#13362) 2025-05-08 13:15:28 +02:00
Diego Devesa 814f795e06
docker : disable arm64 and intel images (#13356) 2025-05-07 16:36:33 +02:00
Diego Devesa 9f2da5871f
llama : build windows releases with dl backends (#13220) 2025-05-04 14:20:49 +02:00
Diego Devesa 1d36b3670b
llama : move end-user examples to tools directory (#13249)
* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00
bandoti d24d592808
ci: fix cross-compile sync issues (#12804) 2025-05-01 19:06:39 -03:00
bandoti 00137157fc
Disable CI cross-compile builds (#13022) 2025-04-19 18:05:03 +02:00
hipudding 54a7272043
CANN: Add x86 build ci (#12950)
* CANN: Add x86 build ci

* CANN: fix code format
2025-04-15 12:08:55 +01:00
R0CKSTAR 8ac9f5d765
ci : Replace freediskspace to free_disk_space in docker.yml (#12861)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-11 09:26:17 +02:00
R0CKSTAR d9a63b2f2e
musa: enable freediskspace for docker image build (#12839)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-04-09 11:22:30 +02:00
Chenguang Li 6e1c4cebdb
CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786)
* [CANN] Support ELU and CONV_TRANSPOSE_1D

* [CANN]Modification review comments

* [CANN]Modification review comments

* [CANN]name adjustment

* [CANN]remove lambda used in template

* [CANN]Use std::func instead of template

* [CANN]Modify the code according to the review comments

---------

Signed-off-by: noemotiovon <noemotiovon@gmail.com>
2025-04-09 14:04:14 +08:00
Xuan-Son Nguyen bd3f59f812
cmake : enable curl by default (#12761)
* cmake : enable curl by default

* no curl if no examples

* fix build

* fix build-linux-cross

* add windows-setup-curl

* fix

* shell

* fix path

* fix windows-latest-cmake*

* run: include_directories

* LLAMA_RUN_EXTRA_LIBS

* sycl: no llama_curl

* no test-arg-parser on windows

* clarification

* try riscv64 / arm64

* windows: include libcurl inside release binary

* add msg

* fix mac / ios / android build

* will this fix xcode?

* try clearing the cache

* add bunch of licenses

* revert clear cache

* fix xcode

* fix xcode (2)

* fix typo
2025-04-07 13:35:19 +02:00
bandoti 1be76e4620
ci: add Linux cross-compile build (#12428) 2025-04-04 14:05:12 -03:00
0cc4m a8a1f33567
Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135)
* Vulkan: Add DP4A MMQ and Q8_1 quantization shader

* Add q4_0 x q8_1 matrix matrix multiplication support

* Vulkan: Add int8 coopmat MMQ support

* Vulkan: Add q4_1, q5_0 and q5_1 quants, improve integer dot code

* Add GL_EXT_integer_dot_product check

* Remove ggml changes, fix mmq pipeline picker

* Remove ggml changes, restore Intel coopmat behaviour

* Fix glsl compile attempt when integer vec dot is not supported

* Remove redundant code, use non-saturating integer dot, enable all matmul sizes for mmq

* Remove redundant comment

* Fix integer dot check

* Fix compile issue with unsupported int dot glslc

* Update Windows build Vulkan SDK version
2025-03-31 14:37:01 +02:00
Guus Waals 0fd8487b14
Fix visionOS build and add CI (#12415)
* ci: add visionOS build workflow

Add a new GitHub Actions workflow for building on visionOS with CMake and Xcode.

* ggml: Define _DARWIN_C_SOURCE for visionOS to fix missing u_xxx typedefs

* ci: remove define hacks for u_xxx system types

---------

Co-authored-by: Giovanni Petrantoni <7008900+sinkingsugar@users.noreply.github.com>
2025-03-19 11:15:23 +01:00
Daniel Bevenius 7b61bcc87c
ci : add --symlinks to xcframework zip command (#12409)
This commit adds the --symlinks option to the zip command used to create
the xcframework zip file. This is necessary to create symlinks in the
zip file. Without this option,  the Versions symlink is stored as a
regular directory entry in the zip file, rather than as a symlink in the
zip which causes the followig error in xcode:
```console
Couldn't resolve framework symlink for '/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current': readlink(/Users/danbev/work/ai/llama.cpp/tmp_1/build-apple/llama.xcframework/macos-arm64_x86_64/llama.framework/Versions/Current): Invalid argument (22)
```

Refs: https://github.com/ggml-org/llama.cpp/pull/11996#issuecomment-2727026377
2025-03-16 18:22:05 +01:00
Oscar Barenys f08f4b3187
Update build.yml for Windows Vulkan builder to use Vulkan 1.4.304 SDK for VK_NV_cooperative_matrix2 support (#12301) 2025-03-12 20:06:58 +01:00
David Huang f1648e91cf
HIP: fix rocWMMA build flags under Windows (#12230) 2025-03-07 08:06:08 +01:00
David Huang 3ffbbd5ce1
HIP: rocWMMA documentation and enabling in workflow builds (#12179)
* Enable rocWMMA for Windows CI build

* Enable for Ubuntu

* GGML_HIP_ROCWMMA_FATTN documentation work
2025-03-06 14:14:11 +01:00
Daniel Bevenius 074c4fd39d
ci : add fetch-depth to xcframework upload (#12195)
This commit adds the fetch-depth: 0 option to the checkout action in the
build.yml workflow file (0 meaning that it fetches the complete
history). The default value is 1 when not specified which only fetches
the latest commit.

This is necessary to ensure that `git rev-list --count HEAD` counts the
total number of commits in the history. Currently because the default is
being used the name of the xcframework artifact is always
llama-b1-xcframework.
2025-03-05 14:16:40 +01:00
Daniel Bevenius fa31c438e0
ci : fix xcframework artifact tag (#12191)
The commit add the name parameter to the upload-artifact action to
ensure that the artifact is uploaded with the correct name.

The motivation for this is that currently the uploaded xcframework
is named as llama-b1-xcframework.zip. With this change the name of this
artifact should contain the build number like the other artifacts.
2025-03-05 10:22:29 +01:00
Daniel Bevenius 3ccbfe5a71
ci : remove xframework upload (#12190)
* ci : remove xframework upload

This commit removes the upload of the xframework zip file as an
artifact.

The motivation for this change is that the xframework zip file is
currently being uploaded as part of strategy and will therefore be
attempted to be uploaded multiple times and will fail the build.

The uploading should be moved to somewhere else in the build to avoid
this.

* ci : add xcframework upload to macos-latest job
2025-03-05 08:34:02 +01:00
Daniel Bevenius a057897ad4
llama : add xcframework build script (#11996)
* llama : add xcframework build script

This commit adds a script to build an XCFramework for Apple
ios, macos, visionos, and tvos platforms.

The generated XCFramework can then be added to a project and used in
the same way as a regular framework. The llama.swiftui example project
has been updated to use the XCFramework and can be started using the
following command:
```console
$ open examples/llama.swiftui/llama.swiftui.xcodeproj/
```

Refs: https://github.com/ggml-org/llama.cpp/issues/10747

* examples : remove llama.cpp (source dir ref) from project.pbxproj

This commit removes the reference to llama.cpp from the project.pbxproj
file since Package.swift has been removed.

* ci : updated build.yml to use build-xcframework.sh

* ci : add xcframework build to github releases

This commit adds the ability to create a GitHub release with the
xcframework build artifact.

* scripts : add apple app validation scripts

This commit adds scripts that can validate the iOS, macOS, tvOS, and
VisionOS applications. The scripts create a simple test app project,
copy the llama.xcframework to the test project, build and archive the
app, create an IPA from the archive, and validate the IPA using altool.

The motivation for this is to provide some basic validation and
hopefully avoid having to manually validate apps in Xcode.

* llama : remove Package.swift

This commit removes the Package.swift file, as we are now building an
XCFramework for the project.

* llama : remove Sources and spm-headers directories

* llama : use TargetConditionals.h for visionOS/tvOS
2025-03-05 06:30:31 +01:00
Daniel Bevenius 2679c3b55d
ci : set GITHUB_ACTION env var for server tests (#12162)
This commit tries to address/improve an issue with the server tests
which are failing with a timeout. Looking at the logs it seems like
they are timing out after 12 seconds:
```
FAILED unit/test_chat_completion.py::test_completion_with_json_schema[False-json_schema0-6-"42"] - TimeoutError: Server did not start within 12 seconds
```

This is somewhat strange as in utils.py we have the following values:
```python
DEFAULT_HTTP_TIMEOUT = 12

if "LLAMA_SANITIZE" in os.environ or "GITHUB_ACTION" in os.environ:
    DEFAULT_HTTP_TIMEOUT = 30

    def start(self, timeout_seconds: int | None = DEFAULT_HTTP_TIMEOUT) -> None:
```
It should be the case that a test running in a github action should have
a timeout of 30 seconds. However, it seems like this is not the case.
Inspecting the logs from the CI job we can see the following environment
variables:
```console
Run cd examples/server/tests
2 cd examples/server/tests
3 ./tests.sh
4 shell: /usr/bin/bash -e {0}
5 env:
6 LLAMA_LOG_COLORS: 1
7 LLAMA_LOG_PREFIX: 1
8 LLAMA_LOG_TIMESTAMPS: 1
9 LLAMA_LOG_VERBOSITY: 10
10 pythonLocation: /opt/hostedtoolcache/Python/3.11.11/x64
```

This probably does not address the underlying issue that the servers
that are providing the models to be downloaded occasionally take a
longer time to response but might improve these situations in some
cases.
2025-03-03 16:17:36 +01:00
Georgi Gerganov f3e64859ed
ci : fix arm upload artifacts (#12024)
* ci : fix arm upload artifacts

* cont : fix archive name to use matrix
2025-02-22 15:03:00 +02:00
Rohanjames1997 335eb04a91
ci : Build on Github-hosted arm64 runners (#12009) 2025-02-22 11:48:57 +01:00
Eve f7b1116af1
update release requirements (#11897) 2025-02-17 12:20:23 +01:00
Xuan-Son Nguyen 818a340ea8
ci : fix (again) arm64 build fails (#11895)
* docker : attempt fixing arm64 build on ci

* qemu v7.0.0-28
2025-02-16 10:36:39 +01:00
Georgi Gerganov 68ff663a04
repo : update links to new url (#11886)
* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci
2025-02-15 16:40:57 +02:00