llama.cpp/ggml
dickbird 03fe95d545 vulkan : add dynamic VRAM heuristic for low-VRAM GPUs
Implements a dynamic VRAM allocation heuristic that automatically calculates
the optimal number of GPU layers to offload based on available VRAM.

Changes:
- Added ggml_backend_vk_get_device_info and ggml_backend_vk_get_default_gpu_layers to ggml-vulkan.cpp
- Added dynamic heuristic to common_model_params_to_llama in common.cpp
- Added llama-vk-device-info tool for inspecting Vulkan devices
- Added documentation in docs/vulkan_low_vram.md

Tested on AMD RX 6500 XT with 4GB VRAM, achieving 2.5-3.1x speedup.
2025-11-27 13:42:24 -05:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include vulkan : add dynamic VRAM heuristic for low-VRAM GPUs 2025-11-24 23:43:55 -05:00
src vulkan : add dynamic VRAM heuristic for low-VRAM GPUs 2025-11-27 13:42:24 -05:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml : remove dirty flag from version string (ggml/1391) 2025-11-24 15:26:31 +02:00