llama.cpp/ggml
Jeff Bolz 6a31e8d40e vulkan: Implement set_tensor_async and the event interfaces
The goal is to enable the async loading code paths in
llama_model_loader::load_all_data, originally from #7896. This works and the
loads themselves are faster, but with host visible vidmem I think the cost of
allocating/mapping vidmem moves and becomes more expensive, and I don't see a
benefit by default. But with GGML_VK_DISABLE_HOST_VISIBLE_VIDMEM=1 I do see a
significant improvement in model loading time.
2025-12-14 22:44:48 -06:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (#17951) 2025-12-12 16:26:03 +02:00
src vulkan: Implement set_tensor_async and the event interfaces 2025-12-14 22:44:48 -06:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt cmake : set `CMAKE_RUNTIME_OUTPUT_DIRECTORY` for non standalone build (ggml/1394) 2025-12-14 08:33:51 +02:00