llama.cpp/include
Daniel Bevenius 1e8c02aa95
llama : add n_sampling_outputs_max cparam
This commit adds a compute graph parameter named n_sampling_outputs_max
which is intended to be used as a max (cap) value for the number of
output for backend sampling.

The motivation for this is that it gives a configurable value instead of
a hardcoded macro (LLAMA_MAX_SAMPLING_OUTPUTS) which has been removed.

I'm not sure if this is the best option as having multiple outputs per
sequence might not be the most common use case. I need to think a little
bit more about this. I'll commmit this to see that CI passes and also
this parameter should be exposed as a common options for tools which
I'll do in a follow up commit.
2026-02-27 06:03:07 +01:00
..
llama-cpp.h lora: make sure model keep track of associated adapters (#18490) 2026-01-15 10:24:28 +01:00
llama.h llama : add n_sampling_outputs_max cparam 2026-02-27 06:03:07 +01:00