llama.cpp

Commit Graph

Author	SHA1	Message	Date
Eve	d15d177f43	vulkan: faster q6_k matmul (#17813 ) * q6_k faster mul mat * 8 values * fix comment * switch to two at a time * start ci for .glsl files	2025-12-14 08:29:37 +01:00
jiahao su	a8c7f33d79	ci : change the cann version and the container pull method (#17953 ) fix error format Update build.yml Remove unnecessary zip files fix update	2025-12-12 20:43:00 +01:00
Sigbjørn Skjæret	45e350e3d3	ci: fix riscv64-native build (#17916 )	2025-12-10 23:24:31 +01:00
Xuan-Son Nguyen	6c2131773c	cli: new CLI experience (#17824 ) * wip * wip * fix logging, add display info * handle commands * add args * wip * move old cli to llama-completion * rm deprecation notice * move server to a shared library * move ci to llama-completion * add loading animation * add --show-timings arg * add /read command, improve LOG_ERR * add args for speculative decoding, enable show timings by default * add arg --image and --audio * fix windows build * support reasoning_content * fix llama2c workflow * color default is auto * fix merge conflicts * properly fix color problem Co-authored-by: bandoti <bandoti@users.noreply.github.com> * better loading spinner * make sure to clean color on force-exit * also clear input files on "/clear" * simplify common_log_flush * add warning in mtmd-cli * implement console writter * fix data race * add attribute * fix llama-completion and mtmd-cli * add some notes about console::log * fix compilation --------- Co-authored-by: bandoti <bandoti@users.noreply.github.com>	2025-12-10 15:28:59 +01:00
Sigbjørn Skjæret	7dba049b07	ci : disable ggml-ci-x64-amd-* (#17753 )	2025-12-04 11:25:08 +01:00
Reese Levine	7ca5991d2b	ggml webgpu: add support for emscripten builds (#17184 ) * Faster tensors (#8) Add fast matrix and matrix/vector multiplication. * Use map for shader replacements instead of pair of strings * Wasm (#9) * webgpu : fix build on emscripten * more debugging stuff * test-backend-ops: force single thread on wasm * fix single-thread case for init_tensor_uniform * use jspi * add pthread * test: remember to set n_thread for cpu backend * Add buffer label and enable dawn-specific toggles to turn off some checks * Intermediate state * Fast working f16/f32 vec4 * Working float fast mul mat * Clean up naming of mul_mat to match logical model, start work on q mul_mat * Setup for subgroup matrix mat mul * Basic working subgroup matrix * Working subgroup matrix tiling * Handle weirder sg matrix sizes (but still % sg matrix size) * Working start to gemv * working f16 accumulation with shared memory staging * Print out available subgroup matrix configurations * Vectorize dst stores for sg matrix shader * Gemv working scalar * Minor set_rows optimization (#4) * updated optimization, fixed errors * non vectorized version now dispatches one thread per element * Simplify * Change logic for set_rows pipelines --------- Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan> Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Comment on dawn toggles * Working subgroup matrix code for (semi)generic sizes * Remove some comments * Cleanup code * Update dawn version and move to portable subgroup size * Try to fix new dawn release * Update subgroup size comment * Only check for subgroup matrix configs if they are supported * Add toggles for subgroup matrix/f16 support on nvidia+vulkan * Make row/col naming consistent * Refactor shared memory loading * Move sg matrix stores to correct file * Working q4_0 * Formatting * Work with emscripten builds * Fix test-backend-ops emscripten for f16/quantized types * Use emscripten memory64 to support get_memory * Add build flags and try ci --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> * Remove extra whitespace * Move wasm single-thread logic out of test-backend-ops for cpu backend * Disable multiple threads for emscripten single-thread builds in ggml_graph_plan * Fix .gitignore * Add memory64 option and remove unneeded macros for setting threads to 1 --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-12-03 10:25:34 +01:00
Ali Tariq	4eba8d9451	ci : RVV1.0 builds with tests (#16682 ) * Added RISC-V supported tests * Added default value for LLAMA_FATAL_WARNINGS and option to specify by user * Added RISC-V supported tests * Added default value for LLAMA_FATAL_WARNINGS and option to specify by user * Removed apt prompt * Added RISC-V specific tests with corrections Corrections included: 1. Changed the test names from debian to ubuntu as it is more stable than Debian Trixie 2. Added explicit compiler in cmake command as GCC compiler below version 14 have been recorded to throw errors with rvv1.0 and some other extensions 3. Added dependencies which are not installed by default in the RISC-V Ubuntu 24.04 4. Separate ccache directory for all jobs as all the ccache results are not the same and may cause ccache to not work * Resolved the merge conflict and cleaned up run.sh * Update ci/run.sh Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Removed previously added build ci for RISC-V * Removed trailing whitespaces * corrected build name Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * cleanup * Enabled build tests (1) Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Enabled build tests (2) Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * enable openssl --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-02 21:46:10 +01:00
Adrien Gallouët	28175f857d	cmake : add option to build and link BoringSSL (#17205 ) * cmake: add option to build and link BoringSSL Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : fix typo Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : disable boringssl test and asm by default Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : skip bssl Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : disable fips Signed-off-by: Adrien Gallouët <angt@huggingface.co> * cmake : fix cmake --install Signed-off-by: Adrien Gallouët <angt@huggingface.co> * ci : use boringssl for windows and mac Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-21 11:46:45 +01:00
Adrien Gallouët	9cc4080441	ci : start using OpenSSL (#17235 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-11-21 11:45:00 +01:00
jiahao su	561a3e2788	ci : change the openEuler-310p image to fix release (#17361 )	2025-11-18 18:10:23 +01:00
jiahao su	ffa277a54c	CANN: Add openEuler-cann in build and release (#17192 ) Update openEuler version Remove variable ASCEND_SOC_TYPE Modify the chip type Fix case in zip filename Change "device" to "chip_type" Modify the value of chip_type	2025-11-18 16:08:55 +08:00
Eve	8b1c339bd2	ci : revert #16249 (#17303 ) * Delete .github/workflows/build-amd.yml * Update build.yml	2025-11-16 19:09:17 +01:00
sudhiarm	3fe36c3238	ci: add Arm-hosted Graviton4 runner (#17021 ) * ci: add Arm-hosted Graviton4 runner * ci: add missing dependencies for graviton4 build * ci: enable LFS checkout on graviton4 * ci: move git-lfs install to dependencies in Graviton4 workflow	2025-11-11 17:58:05 +02:00
Reese Levine	647b960bd8	ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031 ) * Faster tensors (#8) Add fast matrix and matrix/vector multiplication. * Use map for shader replacements instead of pair of strings	2025-11-07 19:27:20 -08:00
Max Krasnyansky	63d2fc46e1	Add experimental ggml-hexagon backend for the Hexagon NPU (#16547 ) * model: add support for extra bufs for all devices * hexagon: add experimental ggml-hexagon backend for the Hexagon NPU This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU. Highlights: - Supports Hexagon versions: v73, v75, v79, and v81 - Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5 - Supports Q4_0, Q8_0, MXFP4, and FP32 data types - Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX Note: This backend is experimental and may exhibit instability or limited performance across supported devices. It is intended for early testing and feedback from llama.cpp/ggml developer and user community. Co-Authored-By: Rajdeep Ganguly <rganguly@qti.qualcomm.com> Co-Authored-By: Todor Boinovski <todorb@qti.qualcomm.com> * hexagon: fix format checker errors * hexagon: update readme and cmake presets * ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions * hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input * hexagon: move ADB helper scripts into scripts/snapdragon/adb * hexagon: replace all f/printfs with GGML_LOG_... * readme: add hexagon to the list supported backends * hexagon: stack malmuts with quantized inputs only * hexagon: add TODO for fixing issues in hexagon_graph_optimize * hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC * scripts: fix lint errors * scripts: update qdc pytest script to make linter happy * hexagon: add reduce sum in fp32 * hexagon: reduce number of vector stores in matmul output * hexagon: remove the need for vdelta in reduce-multiply-x8 * hexagon: consistent use of reduce_sum_fp32 for row_sums * hexagon: some more matmul optimizations and comments Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models). We've handled those cases already but at a higher overhead. * hexagon: update cmake presets * hexagon: add OPMASK support for run-bench.sh wrapper * hexagon: update to use GGML_BACKEND_API * hexagon: remove unused logic for setting tensor flags for the views * hexagon: add asserts to set/get_tensor to make sure we handle complete tensors Same asserts as the CPU backend. * hexagon: use cpy_tensor slow path for non-host buffers * hexagon: error checks in the buffer allocator * cmake: move include(extProj) under ggml-hexagon * hexagon: don't forget to delete the backend on free * hexagon: set/get_tensor size assert apply only to quantized tensors * hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way. Ideally we need a bit more finer log levels. * docs: typos in hexagon developer docs (libggm-...) * hexagon: overhaul error handling in the session/device allocation this should handle all failure paths in the session allocation. * hexagon: update cmake presets to enable fp16 vectors * hexagon: remove unused time_usec function * hexagon: don't forget to release buffer contexts * hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure) * hexagon: remove custom can_repeat function and use ggml_can_repeat --------- Co-authored-by: Rajdeep Ganguly <rganguly@qti.qualcomm.com> Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>	2025-10-22 13:47:09 -07:00
Mathieu Baudier	8415f61e23	ci : add Vulkan on Ubuntu with default packages build (#16532 ) * ci: build Vulkan on Ubuntu with default packages * ci: disable tests in Vulkan build with default Ubuntu packages	2025-10-12 15:48:03 +02:00
sudhiarm	2c0d875ae6	ci: add ARM64 Kleidiai build and test support (#16462 )	2025-10-09 10:13:18 +02:00
Reese Levine	74b8fc17f9	ggml webgpu: profiling, CI updates, reworking of command submission (#16452 ) * Add profiling * More detailed profiling * Rework command submission to avoid global locks * Update wait handling * try new method of waiting on futures * Add serializing of command submission in some cases * Add new pool for timestamp queries and clean up logging * Serialize command submission in CI and leave a TODO note * Update webgpu CI * Add myself as WebGPU codeowner * Deadlock avoidance * Leave WebGPU/Vulkan CI serialized * Fix divide by 0 * Fix logic in division by inflight_threads * Update CODEOWNERS and remove serialize submit option	2025-10-07 13:48:56 -07:00
Sigbjørn Skjæret	3a002afafa	ci : refactor sdk caching to minimize storage (#16414 ) * refactor sdk caching to minimize storage * use correct action * add myself as owner to /.github/actions/ [no ci]	2025-10-06 17:40:21 +02:00
Daniel Bevenius	ad126479c2	ci : change macos-13 to macos-15-intel (#16401 ) This commit updates the macos-13 runners to macos-15-intel. The motivation for this changes is the macos-13 runners are scheduled to be retired on 2025-12-04. Refs: https://github.blog/changelog/2025-09-19-github-actions-macos-13-runner-image-is-closing-down/	2025-10-03 11:45:16 +02:00
Sigbjørn Skjæret	72ee736c44	ci : fix ubuntu-latest-cmake-rpc (disable ccache) (#16388 )	2025-10-02 13:51:36 +02:00
Eve	f09aefaa84	ci: update vulkan ci (#16294 )	2025-10-02 10:10:07 +02:00
Neo Zhang Jianyu	2be72c2b12	SYCL: Update to oneAPI 2025.2 (#16371 ) * update oneapi to 2025.2, use deep-learning-essentials to replace base-tool * update to 2025.2 use deeplearn essi to replace base toolkit * add missed dll * add deep learning essentials * add sycl-ls --------- Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com>	2025-10-02 10:16:25 +03:00
uvos	1fe4e38cc2	ci: Properly install rocwmma for hip builds (#16305 ) * CI: Properly install rocwmma for hip builds on windows we now windows install rocwmma from ubuntu pacakges * CI: update linux rocm docker build to use rocm 7.0	2025-10-01 20:18:03 +02:00
Sigbjørn Skjæret	b2ba81dbe0	ci : fix ccache key for ubuntu-cpu-cmake (#16355 ) * fix ccache key for ubuntu-cpu-cmake * set it for release as well [no ci]	2025-09-30 21:41:42 +02:00
Sigbjørn Skjæret	2df5bcf357	ci : disable ccache for android (#16348 )	2025-09-30 15:38:01 +02:00
Georgi Gerganov	d72f5f7ba2	ci : add AMD runners and workflows (#16249 ) * ci : add AMD runners and workflows * ci : move AMD jobs to separate workflow * cont : fix paths	2025-09-29 17:51:48 +03:00
Aaron Teo	624207e676	devops: add s390x & ppc64le CI (#15925 ) * devops: move s390x and ppc64le ci build we have access to ubuntu-24.04-s390x and ppc64le images now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le for now since they have compiler errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: stop warnings as errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: switch to non-macro flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: going the llama macro route Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian gguf test models Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le to test s390x, check test build Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.inp files for big-endian tests Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.out files for big-endian too Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add python setup and endian byteswap Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: pooring thing does not have s390x python3 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add missing rust compiler for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try rust actions runner Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "devops: try rust actions runner" This reverts commit 3f8db04356033d6c1d7eccc75ca396bc5298250c. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try a different path for rust Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dump home directory and user info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: install gguf-py only Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: missed relative path Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: remove big-endian files since local swapping is working Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: revert test-tokenizer-0 cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix unicode flags conversion from and to uint16_t Bitfields are allocated in different order on s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Simplify byteswap command Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix endianness detection in vocab loader Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Disable test-thread-safety on s390x In this test a model is downloaded, then immediately loaded to check if more downloads are needed, and then used for test. There is no clean way to separate all those steps to add byteswapping between them, so just skip this test. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q8_0 test in test-quantize-fns vec_signed uses unexpected rounding mode. Explicitly use different rounding function. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian stories260K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add s390x test-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix test does not exist Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix model not found llama-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q3_K dot product error in test-quantize-fns on s390x Array q8bytes had only 4 elements allocated, but 8 elements accessed. This lead to write out of bounds and later read of overwritten values out of bounds and incorrect result. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: re-enable ppc64le for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: activate test-thread-safety for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le tests for some reason it keeps failing test-thread-safety tests and I do not have a machine that is able to replicate the tests. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: LLAMA_FATAL_WARNINGS=ON Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Correct repository URL for s390x for test-thread-safety model Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix fs_get_cache_directory Ensure it works even if both XDG_CACHE_HOME and HOME are unset. This might happen in containers. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Re-enable CI for ppc64le Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fortify ggml_rope_impl Only memcpy data from sections argument if it's non-NULL. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way * Update URL for big-endian model * Update .github/workflows/build.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update remaining mentions of BE models to ggml-org/models repo --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Co-authored-by: Aleksei Nikiforov <103434461+AlekseiNikiforovIBM@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-09-27 02:03:33 +08:00
R0CKSTAR	a86a580a66	musa: upgrade musa sdk to 4.3.0 (#16240 ) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-09-26 02:56:38 +02:00
Eve	bee378e098	ci: run the x64 and arm ci on the github machines instead (#16183 ) * run the x64 ci on regular machines * set up the same thing for arm fix test-quantize-perf just like #12306 * try to disable sve * add another sve run	2025-09-25 08:06:06 +03:00
Georgi Gerganov	f505bd83ca	ci : disable AMD workflows + update NVIDIA workflows (#16200 ) * ci : disable AMD workflows + update NVIDIA workflows * cont : fixes * cont : update nvidia vulkan workflows	2025-09-23 20:41:40 +03:00
Georgi Gerganov	0889589dbe	ci : enable Vulkan workflow on Mac (#16194 )	2025-09-23 13:44:25 +03:00
Georgi Gerganov	ec65fb52f0	ci : remove vulkaninfo calls (#16169 )	2025-09-22 10:16:05 +03:00
Georgi Gerganov	4d0a7cbc61	ci : adjust params for less runtime (#16167 ) * ci : adjust params for less runtime * ci : gate BF16 on some hardware * ci : move extra tests to Arm runner	2025-09-22 08:31:40 +03:00
Georgi Gerganov	28baac9c9f	ci : migrate ggml ci to self-hosted runners (#16116 ) * ci : migrate ggml ci to a self-hosted runners * ci : add T4 runner * ci : add instructions for adding self-hosted runners * ci : disable test-backend-ops from debug builds due to slowness * ci : add AMD V710 runner (vulkan) * cont : add ROCM workflow * ci : switch to qwen3 0.6b model * cont : fix the context size	2025-09-21 16:50:45 +03:00
Daniel Bevenius	a91d035b90	ci : revert back to macos-13 for macOS-latest-cmake-x64 (#16040 ) This commit reverts the change of the runs-on parameter for the macOS-latest-cmake-x64 job back to macos-13 that was make in Commit `51abc96bdc` ("ci : update macos-latest* jobs to use macos-latest (#15938)"). The motivation for this is that using macos-latest will cause an ARM based runner to be used, and not an x64 based runner. Refs: https://github.com/ggml-org/llama.cpp/pull/15938#issuecomment-3300805127	2025-09-17 09:34:09 +02:00
Daniel Bevenius	77475530b8	ci : use macos-latest for arm64 webgpu build (#16029 ) This commit updates the runs-on field for the macOS arm64 webgpu build job to use macos-latest instead of just latest. The motivation for this is that this job can wait for a runner to pick up the job for a very long time, sometimes over 7 hours. This is an attempt to see if this change can help reduce the wait time. Refs: https://github.com/ggml-org/llama.cpp/actions/runs/17754163447/job/50454257570?pr=16004	2025-09-16 15:27:52 +02:00
Daniel Bevenius	76888d202e	ci : upload xcframework artifact from ios-xcode-build job (#16010 ) This commit updates the github workflows build.yml file to include steps for uploading and downloading the xcframework artifact. The macos-latest-swift job now depends on the ios-xcode-build job and downloads the xcframework artifact produced by it. The motivation for this changes is that it takes a long time to build the xcframework and we are currently doing this twice in the workflow. With this change, we only build it once and reuse the artifact.	2025-09-16 13:41:38 +02:00
Daniel Bevenius	51abc96bdc	ci : update macos-latest* jobs to use macos-latest (#15938 ) * ci : update macos-latest* jobs to use macos-latest This commit updates the jobs that are named macos-latest* to use the macos-latest label instead explicit versions. The motivation for this is that there is currently a mixuture of versions in this workflow and there are jobs that are failing because they require a newer version. Refs: https://github.com/ggml-org/llama.cpp/actions/runs/17644792595/job/50140010907#step:5:1759 * ci : add xcodebuild -downloadPlatform iOS command	2025-09-16 05:57:16 +02:00
lcy	a0e13dcbe5	build: fix the build failures of Windows HIP release job (#15984 ) * build: fix the cache keys for Windows HIP release job Update the cache keys to include the HIP SDK version, preventing the use of outdated ROCm installation caches. * build: sync changes from release.yml to build.yml - Update HIP SDK version to 25.Q3 and ROCm version to 6.4.2 - Update the cache keys to reflect the new versions * build: remove Windows HIP release for gfx1151 since the current stable rocWMMA does not support gfx1151.	2025-09-14 07:20:35 -07:00
Georgi Gerganov	55758b00ca	metal : refactor kernel loading (#15964 ) * metal : refactor bin kernels loading ggml-ci * metal : refactor rms kernel loading ggml-ci * ci : try to add memory leaks check ggml-ci * ci : try to enable memory leak detection for Mac * cont : seems to be working	2025-09-13 16:24:22 +03:00
Daniel Bevenius	ff02caf9ee	ci : cache ROCm installation in windows-latest-cmake-hip (#15887 ) This commit adds caching of the ROCm installation for the windows-latest-cmake-hip job. The motivation for this is that the installation can sometimes hang and/or not complete properly leaving an invalid installation which later fails the build. By caching the installation hopefully we can keep a good installation available in the cache and avoid the installation step. Refs: https://github.com/ggml-org/llama.cpp/pull/15365	2025-09-10 05:23:19 +02:00
Sigbjørn Skjæret	b143fbc87a	ci : fix hang in windows-hip build/release (#15365 ) * fix hang in windows-latest-cmake-hip * apply fix to release as well	2025-08-17 13:30:23 +02:00
Sigbjørn Skjæret	d3248d9b65	ci : fix ios-xcode-build (#15324 ) * fix ios-xcode-build * use xcode-select with fixed version * switch to macos-15 to get xcode 16.4	2025-08-15 14:02:39 +02:00
Diego Devesa	7aeee88cfe	ci : move ccache action to ggml-org fork (#15328 )	2025-08-15 12:27:02 +02:00
uvos	29c8fbe4e0	HIP: bump requirement to rocm 6.1 (#15296 )	2025-08-13 20:44:30 +02:00
Reese Levine	5fd160bbd9	ggml: Add basic SET_ROWS support in WebGPU (#15137 ) * Begin work on set_rows * Work on set rows * Add error buffers for reporting unsupported SET_ROWS indices * Remove extra comments	2025-08-06 15:14:40 -07:00
Reese Levine	9515c6131a	ggml: WebGPU disable SET_ROWS for now (#15078 ) * Add paramater buffer pool, batching of submissions, refactor command building/submission * Add header for linux builds * Free staged parameter buffers at once * Format with clang-format * Fix thread-safe implementation * Use device implicit synchronization * Update workflow to use custom release * Remove testing branch workflow * Disable set_rows until it's implemented * Fix potential issue around empty queue submission * Try synchronous submission * Try waiting on all futures explicitly * Add debug * Add more debug messages * Work on getting ssh access for debugging * Debug on failure * Disable other tests * Remove extra if * Try more locking * maybe passes? * test * Some cleanups * Restore build file * Remove extra testing branch ci	2025-08-05 16:26:38 -07:00
Reese Levine	587d0118f5	ggml: WebGPU backend host improvements and style fixing (#14978 ) * Add parameter buffer pool, batching of submissions, refactor command building/submission * Add header for linux builds * Free staged parameter buffers at once * Format with clang-format * Fix thread-safe implementation * Use device implicit synchronization * Update workflow to use custom release * Remove testing branch workflow	2025-08-04 08:52:43 -07:00
R0CKSTAR	3f4fc97f1d	musa: upgrade musa sdk to rc4.2.0 (#14498 ) * musa: apply mublas API changes Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: update musa version to 4.2.0 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: restore MUSA graph settings in CMakeLists.txt Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: disable mudnnMemcpyAsync by default Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: switch back to non-mudnn images Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * minor changes Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * musa: restore rc in docker image tag Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>	2025-07-24 20:05:37 +01:00

1 2 3 4 5

241 Commits