llama.cpp

History

Oliver Simons 333da805fe Add initial version for top-p sampling As we only support static graphs for the time and we don't know the size of the output of top-p, we have to do value-scaling same as for min-p operator. Further improvements can be applied to the unit-test (i.e. check for equivalence of top_p happening on backend with top_p happening on cpu) and also by constructing candidates and sorting those as opposed to reversing the sort of the logits (this would be arange + get_rows instead of argsort + get_rows)	2025-11-28 15:16:20 +01:00
..
llama-cpp.h	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
llama.h	Add initial version for top-p sampling	2025-11-28 15:16:20 +01:00

Oliver Simons 333da805fe Add initial version for top-p sampling

As we only support static graphs for the time and we don't know the size
of the output of top-p, we have to do value-scaling same as for min-p
operator.

Further improvements can be applied to the unit-test (i.e. check for
equivalence of top_p happening on backend with top_p happening on cpu)
and also by constructing candidates and sorting those as opposed to
reversing the sort of the logits (this would be arange +
get_rows instead of argsort + get_rows)

2025-11-28 15:16:20 +01:00

llama-cpp.h

llama : add `llama_vocab`, functions -> methods, naming (#11110 )

2025-01-12 11:32:42 +02:00

llama.h

Add initial version for top-p sampling

2025-11-28 15:16:20 +01:00