llama.cpp/tests
Daniel Bevenius 7884b0e0ac
sampling : add support for backend sampling
This commit adds support for performing sampling operations on the
backend (e.g. GPU) as part of the model computation graph.

The motivation for this feature is to enable sampling to be performed
directly on the backend as part of the computation graph being executed,
allowing for some or all of the sampling to be done on the backend.

For example, the backend sampler chain might select/sample a token
directly in which case only the sampled token needs to be transferred
from device memory to host memory.

It is also possible for the backend samplers to perform filtering of
the logits, or compute and filter the probability distribution, in
which case only the filtered logits or probabilites need to be
transferred back to system memory for further processing by CPU
samplers.

Currently the backend sampling works in a similar manner to how
pooling works, it is a function that is called by build_graph and the
sampler operations become part of the models computation graph.
2025-11-17 16:15:58 +01:00
..
.gitignore gitignore : Ignore vim swap files in tests (#15901) 2025-09-10 14:28:47 +03:00
CMakeLists.txt sampling : add support for backend sampling 2025-11-17 16:15:58 +01:00
get-model.cpp ci : add model tests + script wrapper (#4586) 2024-01-26 14:18:00 +02:00
get-model.h ci : add model tests + script wrapper (#4586) 2024-01-26 14:18:00 +02:00
run-json-schema-to-grammar.mjs llama : move end-user examples to tools directory (#13249) 2025-05-02 20:27:13 +02:00
test-alloc.cpp ggml : fix graph reallocation with multiple chunks (#16396) 2025-10-03 13:49:08 +02:00
test-arg-parser.cpp common : remove common_has_curl() (#16351) 2025-09-30 17:39:44 +03:00
test-autorelease.cpp llama : add `llama_vocab`, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
test-backend-ops.cpp metal : add cumsum (#17305) 2025-11-17 11:51:13 +02:00
test-backend-sampler.cpp sampling : add support for backend sampling 2025-11-17 16:15:58 +01:00
test-barrier.cpp test-barrier : do not use more threads than physically available (#16389) 2025-10-02 20:10:12 +02:00
test-c.c ggml : remove kompute backend (#14501) 2025-07-03 07:48:32 +03:00
test-chat-parser.cpp common : handle unicode during partial json parsing (#16526) 2025-10-12 16:18:47 +03:00
test-chat-template.cpp chat : Granite Docling stopping (#16438) 2025-10-06 18:59:40 +02:00
test-chat.cpp chat: Add LFM2 tool handling (#16763) 2025-10-27 23:54:01 +01:00
test-double-float.cpp ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
test-gbnf-validator.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-gguf.cpp gguf: fix failure on version == 0 (#13956) 2025-06-01 18:08:05 +02:00
test-grammar-integration.cpp grammar : use int64_t to avoid int overflows in int schema to grammar conversion logic (#16626) 2025-10-17 08:59:31 +03:00
test-grammar-llguidance.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-grammar-parser.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-json-partial.cpp common : handle unicode during partial json parsing (#16526) 2025-10-12 16:18:47 +03:00
test-json-schema-to-grammar.cpp grammar : support array references in json schema (#16792) 2025-10-28 09:37:52 +01:00
test-llama-grammar.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-log.cpp common : use common_ prefix for common library functions (#9805) 2024-10-10 22:57:42 +02:00
test-lora-conversion-inference.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
test-model-load-cancel.cpp llama : update llama_model API names (#11063) 2025-01-06 10:55:18 +02:00
test-mtmd-c-api.c mtmd : add C public API (#13184) 2025-05-04 23:43:42 +02:00
test-opt.cpp tests : fix test-opt with GGML_BACKEND_DL (#15599) 2025-08-26 22:14:38 +02:00
test-quantize-fns.cpp tests : fix test-quantize-fns to init the CPU backend (#12306) 2025-03-10 14:07:15 +02:00
test-quantize-perf.cpp ci: run the x64 and arm ci on the github machines instead (#16183) 2025-09-25 08:06:06 +03:00
test-quantize-stats.cpp docker : do not build tests (#13204) 2025-04-30 10:44:07 +02:00
test-regex-partial.cpp `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
test-rope.cpp ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805) 2025-11-11 13:33:24 +02:00
test-sampling.cpp sampling : optimize samplers by reusing bucket sort (#15665) 2025-08-31 20:41:02 +03:00
test-thread-safety.cpp server : support unified cache across slots (#16736) 2025-11-02 18:14:04 +02:00
test-tokenizer-0.cpp llama : add `llama_vocab`, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
test-tokenizer-0.py py : logging and flake8 suppression refactoring (#7081) 2024-05-05 08:07:48 +03:00
test-tokenizer-0.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
test-tokenizer-1-bpe.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-tokenizer-1-spm.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-tokenizer-random.py requirements : update transformers/torch for Embedding Gemma (#15828) 2025-09-09 06:06:52 +02:00
test-tokenizers-repo.sh devops: add s390x & ppc64le CI (#15925) 2025-09-27 02:03:33 +08:00