llama.cpp/common
Daniel Bevenius 7884b0e0ac
sampling : add support for backend sampling
This commit adds support for performing sampling operations on the
backend (e.g. GPU) as part of the model computation graph.

The motivation for this feature is to enable sampling to be performed
directly on the backend as part of the computation graph being executed,
allowing for some or all of the sampling to be done on the backend.

For example, the backend sampler chain might select/sample a token
directly in which case only the sampled token needs to be transferred
from device memory to host memory.

It is also possible for the backend samplers to perform filtering of
the logits, or compute and filter the probability distribution, in
which case only the filtered logits or probabilites need to be
transferred back to system memory for further processing by CPU
samplers.

Currently the backend sampling works in a similar manner to how
pooling works, it is a function that is called by build_graph and the
sampler operations become part of the models computation graph.
2025-11-17 16:15:58 +01:00
..
CMakeLists.txt cmake : cleanup (#17199) 2025-11-12 14:48:30 +02:00
arg.cpp sampling : add support for backend sampling 2025-11-17 16:15:58 +01:00
arg.h common: move download functions to download.(cpp|h) (#17059) 2025-11-07 11:23:34 +01:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) 2025-06-13 10:38:52 +02:00
chat-parser.cpp common : handle unicode during partial json parsing (#16526) 2025-10-12 16:18:47 +03:00
chat-parser.h model : Apertus model implementation (#15852) 2025-10-02 20:43:22 +03:00
chat.cpp common : move gpt-oss reasoning processing to init params (#16937) 2025-11-02 16:56:28 +02:00
chat.h chat: Add LFM2 tool handling (#16763) 2025-10-27 23:54:01 +01:00
common.cpp sampling : add support for backend sampling 2025-11-17 16:15:58 +01:00
common.h sampling : add support for backend sampling 2025-11-17 16:15:58 +01:00
console.cpp console : utf-8 fix for windows stdin (#9690) 2024-09-30 11:23:42 +03:00
console.h gguf : new file format with flexible meta data (beta) (#2398) 2023-08-21 23:07:43 +03:00
download.cpp cmake : move OpenSSL linking to vendor/cpp-httplib (#17177) 2025-11-12 12:32:50 +01:00
download.h arg: add --cache-list argument to list cached models (#17073) 2025-11-08 21:54:14 +01:00
http.h common: introduce http.h for httplib-based client (#16373) 2025-10-01 20:22:18 +03:00
json-partial.cpp common : handle unicode during partial json parsing (#16526) 2025-10-12 16:18:47 +03:00
json-partial.h sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
json-schema-to-grammar.cpp grammar : support array references in json schema (#16792) 2025-10-28 09:37:52 +01:00
json-schema-to-grammar.h sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
llguidance.cpp sampling : add support for backend sampling 2025-11-17 16:15:58 +01:00
log.cpp mtmd: add mtmd_log_set (#17268) 2025-11-14 15:56:19 +01:00
log.h mtmd: add mtmd_log_set (#17268) 2025-11-14 15:56:19 +01:00
ngram-cache.cpp ggml : portability fixes for VS 2017 (#12150) 2025-03-04 18:53:26 +02:00
ngram-cache.h llama : use LLAMA_TOKEN_NULL (#11062) 2025-01-06 10:52:15 +02:00
regex-partial.cpp `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
regex-partial.h `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
sampling.cpp sampling : add support for backend sampling 2025-11-17 16:15:58 +01:00
sampling.h sampling : add support for backend sampling 2025-11-17 16:15:58 +01:00
speculative.cpp sampling : optimize samplers by reusing bucket sort (#15665) 2025-08-31 20:41:02 +03:00
speculative.h server : implement universal assisted decoding (#12635) 2025-07-31 14:25:23 +02:00