dickbird
e8bf9ed0ce
vulkan : add dynamic VRAM heuristic for low-VRAM GPUs
...
Implements a dynamic VRAM allocation heuristic that automatically calculates
the optimal number of GPU layers to offload based on available VRAM.
Changes:
- Added ggml_backend_vk_get_device_info and ggml_backend_vk_get_default_gpu_layers to ggml-vulkan.cpp
- Added dynamic heuristic to common_model_params_to_llama in common.cpp
- Added llama-vk-device-info tool for inspecting Vulkan devices
- Added documentation in docs/vulkan_low_vram.md
Tested on AMD RX 6500 XT with 4GB VRAM, achieving 2.5-3.1x speedup.
2025-11-27 20:36:52 -05:00
dickbird
5ecff8a9a9
vulkan : add dynamic VRAM heuristic for low-VRAM GPUs
2025-11-24 23:43:55 -05:00
Danny Milosavljevic
c2a67efe38
vulkan: Make Vulkan optional at runtime ( #11493 ). ( #11494 )
...
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-02-10 07:17:21 +01:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries ( #10256 )
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00
Diego Devesa
f010b77a37
vulkan : add backend registry / device interfaces ( #9721 )
...
* vulkan : add backend registry / device interfaces
* llama : print devices used on model load
2024-10-17 02:46:58 +02:00
Diego Devesa
c83ad6d01e
ggml-backend : add device and backend reg interfaces ( #9707 )
...
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-10-03 01:49:47 +02:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00