llama.cpp/ggml/src
Jeff Bolz cdbada8d10
vulkan: Add perf logger mode with concurrency (#17944)
This implements a variation of the perf logger where rather than timing each
operation individually with effectively a barrier in between, we put the
timing boundaries where we already synchronize and time the groups of work
that normally overlap. This can be useful to help understand whether
individual operations need to be optimized, or if the group is already running
efficiently.

GGML_VK_PERF_LOGGER_CONCURRENT=1 enables the new mode (when
GGML_VK_PERF_LOGGER is also set).

GGML_VK_SYNC_LOGGER=1 replaces the ENABLE_SYNC_LOGGING compile time switch.
2025-12-19 06:36:46 +01:00
..
ggml-blas sync : whisper.cpp (ggml/1359) 2025-09-29 17:43:58 +03:00
ggml-cann cann : fix ops broken by circular padding guard (#17825) 2025-12-12 15:49:27 +01:00
ggml-cpu ggml-cpu: extend support for RVV floating-point kernels (#17318) 2025-12-18 16:02:09 +02:00
ggml-cuda model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106) 2025-12-19 00:18:01 +01:00
ggml-hexagon ggml-hexagon: swiglu_oai operation (#18114) 2025-12-17 13:38:21 -08:00
ggml-hip HIP: fix AMDGPU_TARGETS, update documentation (#16803) 2025-10-27 21:39:49 +01:00
ggml-metal metal: use shared buffers on eGPU (#17866) 2025-12-15 16:14:49 +02:00
ggml-musa CUDA: faster tile FA, add oob checks, more HSs (#16492) 2025-10-11 20:54:32 +02:00
ggml-opencl ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985) 2025-12-06 15:07:02 +01:00
ggml-rpc ggml : improve error handling for search path existence checks (#17653) 2025-12-06 12:28:16 +01:00
ggml-sycl [SYCL] Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (#17826) 2025-12-15 10:35:15 +08:00
ggml-vulkan vulkan: Add perf logger mode with concurrency (#17944) 2025-12-19 06:36:46 +01:00
ggml-webgpu ggml webgpu: unary op suppport, code refactoring, ops support (#17764) 2025-12-05 12:25:51 -08:00
ggml-zdnn zdnn: refactor codebase + add docs (#16178) 2025-09-23 14:53:05 +08:00
ggml-zendnn ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690) 2025-12-07 00:13:33 +08:00
CMakeLists.txt llama.android : Rewrite Android binding (w/o cpu_features dep) (#17413) 2025-12-17 10:14:47 +02:00
ggml-alloc.c llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653) 2025-12-15 09:24:59 +01:00
ggml-backend-impl.h rpc : add support for multiple devices (#16276) 2025-10-04 12:49:16 +03:00
ggml-backend-reg.cpp ggml-zendnn : add ZenDNN backend for AMD CPUs (#17690) 2025-12-07 00:13:33 +08:00
ggml-backend.cpp llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653) 2025-12-15 09:24:59 +01:00
ggml-common.h llama : add gpt-oss (#15091) 2025-08-05 22:10:36 +03:00
ggml-impl.h ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (#17063) 2025-11-13 20:54:47 +02:00
ggml-opt.cpp finetune: SGD optimizer, more CLI args (#13873) 2025-08-14 12:03:57 +02:00
ggml-quants.c ggml : fix uninitialized is_on_grid in quantize_row_iq3_xxs_impl (#15928) 2025-09-23 10:25:20 +02:00
ggml-quants.h llama : add gpt-oss (#15091) 2025-08-05 22:10:36 +03:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) 2024-12-12 19:02:49 +01:00
ggml.c llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653) 2025-12-15 09:24:59 +01:00
ggml.cpp ggml : Print backtrace on uncaught C++ exceptions (ggml/1232) 2025-06-01 13:43:57 +03:00
gguf.cpp ggml, llama : use defaulted constructors/destructors (#17649) 2025-12-03 07:12:18 +01:00