llama.cpp

History

dickbird 03fe95d545 vulkan : add dynamic VRAM heuristic for low-VRAM GPUs Implements a dynamic VRAM allocation heuristic that automatically calculates the optimal number of GPU layers to offload based on available VRAM. Changes: - Added ggml_backend_vk_get_device_info and ggml_backend_vk_get_default_gpu_layers to ggml-vulkan.cpp - Added dynamic heuristic to common_model_params_to_llama in common.cpp - Added llama-vk-device-info tool for inspecting Vulkan devices - Added documentation in docs/vulkan_low_vram.md Tested on AMD RX 6500 XT with 4GB VRAM, achieving 2.5-3.1x speedup.		2025-11-27 13:42:24 -05:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	vulkan : add dynamic VRAM heuristic for low-VRAM GPUs	2025-11-24 23:43:55 -05:00
src	vulkan : add dynamic VRAM heuristic for low-VRAM GPUs	2025-11-27 13:42:24 -05:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	ggml : remove dirty flag from version string (ggml/1391)	2025-11-24 15:26:31 +02:00