llama.cpp

History

Oleksandr Kuvshynov 88d23ad515 vulkan: handle device dedup on MacOS + Vega II Duo cards (#19058 ) Deduplication here relied on the fact that vulkan would return unique UUID for different physical GPUs. It is at the moment not always the case. On Mac Pro 2019 running Mac OS, with 2 Vega II Duo cards (so, 4 GPU total), MotlenVK would assign same UUID to pairs of GPUs, unless they are connected with Infinity Fabric. See more details here: KhronosGroup/MoltenVK#2683. The right way is to fix that in MoltenVK, but until it is fixed, llama.cpp would only recognize 2 of 4 GPUs in such configuration. The deduplication logic here is changed to only filter GPUs if UUID is same but driver is different.		2026-01-28 12:35:54 +01:00
..
ggml-blas	ggml : add ggml_build_forward_select (#18550 )	2026-01-19 20:03:19 +02:00
ggml-cann	ggml : add ggml_build_forward_select (#18550 )	2026-01-19 20:03:19 +02:00
ggml-cpu	ggml-cpu: arm64: Q4_K scale unroll and vectorization (#19108 )	2026-01-28 09:15:56 +02:00
ggml-cuda	cuda : fix "V is K view" check for non-unified KV cache (#19145 )	2026-01-28 09:15:27 +02:00
ggml-hexagon	ggml-hexagon: flash-attn opt (#19025 )	2026-01-23 22:02:07 -08:00
ggml-hip	HIP: fix AMDGPU_TARGETS, update documentation (#16803 )	2025-10-27 21:39:49 +01:00
ggml-metal	metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/macOS (#19088 )	2026-01-25 20:07:19 +02:00
ggml-musa	CUDA: faster tile FA, add oob checks, more HSs (#16492 )	2025-10-11 20:54:32 +02:00
ggml-opencl	opencl: add flattened q6_K mv (#19054 )	2026-01-26 19:36:24 -08:00
ggml-rpc	rpc : use unordered_map::reserve and emplace (#18513 )	2026-01-02 12:09:36 +02:00
ggml-sycl	[SYCL] use malloc to support both iGPU and dGPU in same time (#18992 )	2026-01-23 20:54:10 +08:00
ggml-virtgpu	ggml: new backend for Virglrenderer API Remoting acceleration (v2) (#18718 )	2026-01-28 17:49:40 +08:00
ggml-vulkan	vulkan: handle device dedup on MacOS + Vega II Duo cards (#19058 )	2026-01-28 12:35:54 +01:00
ggml-webgpu	ggml webgpu: Split shared state (webgpu_context) into global state and per-thread state (#18976 )	2026-01-27 20:53:36 -08:00
ggml-zdnn	ggml-zdnn : mark zDNN buffers as non-host (#18967 )	2026-01-22 01:16:21 +01:00
ggml-zendnn	ggml-zendnn : update ZenDNN git tag to main branch (#19133 )	2026-01-28 06:21:36 +08:00
CMakeLists.txt	ggml: new backend for Virglrenderer API Remoting acceleration (v2) (#18718 )	2026-01-28 17:49:40 +08:00
ggml-alloc.c	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 )	2025-12-15 09:24:59 +01:00
ggml-backend-impl.h	llama: use host memory if device reports 0 memory (#18587 )	2026-01-09 05:34:56 +08:00
ggml-backend-reg.cpp	ggml: new backend for Virglrenderer API Remoting acceleration (v2) (#18718 )	2026-01-28 17:49:40 +08:00
ggml-backend.cpp	ggml : add ggml_build_forward_select (#18550 )	2026-01-19 20:03:19 +02:00
ggml-common.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-impl.h	ggml : add ggml_build_forward_select (#18550 )	2026-01-19 20:03:19 +02:00
ggml-opt.cpp	finetune: SGD optimizer, more CLI args (#13873 )	2025-08-14 12:03:57 +02:00
ggml-quants.c	ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15928 )	2025-09-23 10:25:20 +02:00
ggml-quants.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-threading.cpp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-threading.h	remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )	2024-12-12 19:02:49 +01:00
ggml.c	ggml : add ggml_build_forward_select (#18550 )	2026-01-19 20:03:19 +02:00
ggml.cpp	ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)	2025-06-01 13:43:57 +03:00
gguf.cpp	GGUF: check that tensor size is representable (#19072 )	2026-01-24 21:57:51 +01:00