Implements a dynamic VRAM allocation heuristic that automatically calculates
the optimal number of GPU layers to offload based on available VRAM.
Changes:
- Added ggml_backend_vk_get_device_info and ggml_backend_vk_get_default_gpu_layers to ggml-vulkan.cpp
- Added dynamic heuristic to common_model_params_to_llama in common.cpp
- Added llama-vk-device-info tool for inspecting Vulkan devices
- Added documentation in docs/vulkan_low_vram.md
Tested on AMD RX 6500 XT with 4GB VRAM, achieving 2.5-3.1x speedup.