llama.cpp/include
Daniel Andersen b4078f77ab llama : add group split-mode to minimize GPU usage
Introducing a new split-mode, selects the minimum subset of GPUs needed to fit a model
based on estimated memory requirements. GPUs are sorted by free memory and selected until
cumulative memory meets the model's estimated needs. This helps reducing inter-communication
between the GPUs and increase performance and overall power consumption on multi-gpu systems.
2026-02-13 11:15:38 +00:00
..
llama-cpp.h lora: make sure model keep track of associated adapters (#18490) 2026-01-15 10:24:28 +01:00
llama.h llama : add group split-mode to minimize GPU usage 2026-02-13 11:15:38 +00:00