llama.cpp

History

Daniel Andersen b4078f77ab llama : add group split-mode to minimize GPU usage Introducing a new split-mode, selects the minimum subset of GPUs needed to fit a model based on estimated memory requirements. GPUs are sorted by free memory and selected until cumulative memory meets the model's estimated needs. This helps reducing inter-communication between the GPUs and increase performance and overall power consumption on multi-gpu systems.	2026-02-13 11:15:38 +00:00
..
llama-cpp.h	lora: make sure model keep track of associated adapters (#18490 )	2026-01-15 10:24:28 +01:00
llama.h	llama : add group split-mode to minimize GPU usage	2026-02-13 11:15:38 +00:00

Daniel Andersen b4078f77ab llama : add group split-mode to minimize GPU usage

Introducing a new split-mode, selects the minimum subset of GPUs needed to fit a model
based on estimated memory requirements. GPUs are sorted by free memory and selected until
cumulative memory meets the model's estimated needs. This helps reducing inter-communication
between the GPUs and increase performance and overall power consumption on multi-gpu systems.

2026-02-13 11:15:38 +00:00

llama-cpp.h

lora: make sure model keep track of associated adapters (#18490 )

2026-01-15 10:24:28 +01:00

llama.h

llama : add group split-mode to minimize GPU usage

2026-02-13 11:15:38 +00:00