llama.cpp/src
Daniel Bevenius 51fee29822
sampling : always populate logits for sampled probs
This commit updates common/sampler.cpp set_logits and
src/llama-sampling.cpp llama_sampler_sample to always populate the
logits field when backend sampled probabilities are available.

The motivation for this is that this ensure that CPU sampler always have
access to the logits values even when probabilites have been produced by
backend samplers.
2025-11-19 07:14:11 +01:00
..
models model : add AfmoeForCausalLM support (#16477) 2025-11-14 13:54:10 +01:00
CMakeLists.txt sampling : add support for backend sampling 2025-11-17 16:15:58 +01:00
llama-adapter.cpp aLoRA Support (#15327) 2025-09-05 17:32:39 -06:00
llama-adapter.h aLoRA Support (#15327) 2025-09-05 17:32:39 -06:00
llama-arch.cpp model : add AfmoeForCausalLM support (#16477) 2025-11-14 13:54:10 +01:00
llama-arch.h model : add AfmoeForCausalLM support (#16477) 2025-11-14 13:54:10 +01:00
llama-backend-sampler.cpp sampling : add support for backend sampling 2025-11-17 16:15:58 +01:00
llama-batch.cpp sampling : ensure at most one output token per seq 2025-11-18 16:06:23 +01:00
llama-batch.h sampling : ensure at most one output token per seq 2025-11-18 16:06:23 +01:00
llama-chat.cpp model : add openPangu-Embedded (#16941) 2025-11-05 10:28:58 +01:00
llama-chat.h model : add openPangu-Embedded (#16941) 2025-11-05 10:28:58 +01:00
llama-context.cpp sampling : ensure at most one output token per seq 2025-11-18 16:06:23 +01:00
llama-context.h sampling : always expose sampled_ids 2025-11-18 15:11:59 +01:00
llama-cparams.cpp cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188) 2025-06-15 10:08:58 +03:00
llama-cparams.h server : support unified cache across slots (#16736) 2025-11-02 18:14:04 +02:00
llama-grammar.cpp `server`: streaming of tool calls and thoughts when `--jinja` is on (#12379) 2025-05-25 01:48:08 +01:00
llama-grammar.h `tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034) 2025-03-05 13:05:13 +00:00
llama-graph.cpp sampling : remove version from sampler chain 2025-11-19 06:59:03 +01:00
llama-graph.h sampling : remove version from sampler chain 2025-11-19 06:59:03 +01:00
llama-hparams.cpp hparams : add n_embd_inp() to support extended embed (#16928) 2025-11-07 19:27:58 +01:00
llama-hparams.h hparams : add n_embd_inp() to support extended embed (#16928) 2025-11-07 19:27:58 +01:00
llama-impl.cpp GGUF: C++ refactor, backend support, misc fixes (#11030) 2025-01-07 18:01:58 +01:00
llama-impl.h llama: use FA + max. GPU layers by default (#15434) 2025-08-30 16:32:10 +02:00
llama-io.cpp llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
llama-io.h llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181) 2025-03-13 12:35:44 +02:00
llama-kv-cache-iswa.cpp kv-cache : pad the cache size to 256 for performance (#17046) 2025-11-07 20:03:25 +02:00
llama-kv-cache-iswa.h llama: print memory breakdown on exit (#15860) 2025-09-24 16:53:48 +02:00
llama-kv-cache.cpp model: add support for qwen3vl series (#16780) 2025-10-30 16:19:14 +01:00
llama-kv-cache.h memory : remove KV cache size padding (#16812) 2025-10-28 20:19:44 +02:00
llama-kv-cells.h llama: store mrope data in KV cell (#16825) 2025-10-29 18:09:18 +01:00
llama-memory-hybrid.cpp memory : use sequential equal splits for recurrent modules (#16442) 2025-10-07 08:24:17 +03:00
llama-memory-hybrid.h llama: print memory breakdown on exit (#15860) 2025-09-24 16:53:48 +02:00
llama-memory-recurrent.cpp memory: Hybrid context shift (#17009) 2025-11-10 17:14:23 +02:00
llama-memory-recurrent.h llama: consistent ctx <-> buf order for KV cache (#16746) 2025-10-28 11:23:54 +01:00
llama-memory.cpp memory : correctly handle failure in apply() (#14438) 2025-06-30 18:03:03 +03:00
llama-memory.h llama: print memory breakdown on exit (#15860) 2025-09-24 16:53:48 +02:00
llama-mmap.cpp llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013) 2025-06-05 11:57:42 +02:00
llama-mmap.h llama-mmap: fix missing include (#11796) 2025-02-10 20:58:18 +02:00
llama-model-loader.cpp model : Apertus model implementation (#15852) 2025-10-02 20:43:22 +03:00
llama-model-loader.h model: support GLM 4.5 family of models (#14939) 2025-08-04 20:29:25 +02:00
llama-model-saver.cpp llama : improve sep token handling (#14272) 2025-06-20 14:04:09 +02:00
llama-model-saver.h llama/ggml: add LLM training support (#10544) 2025-05-12 14:44:49 +02:00
llama-model.cpp graph : do not include llama-model.h 2025-11-18 13:53:25 +02:00
llama-model.h model : add AfmoeForCausalLM support (#16477) 2025-11-14 13:54:10 +01:00
llama-quant.cpp llama : use std::abs instead of abs (#16853) 2025-10-30 08:30:58 +02:00
llama-quant.h llama : refactor `src/llama.cpp` (#10902) 2025-01-03 10:18:53 +02:00
llama-sampling.cpp sampling : always populate logits for sampled probs 2025-11-19 07:14:11 +01:00
llama-sampling.h sampling : remove version from sampler chain 2025-11-19 06:59:03 +01:00
llama-vocab.cpp model : add AfmoeForCausalLM support (#16477) 2025-11-14 13:54:10 +01:00
llama-vocab.h model : add AfmoeForCausalLM support (#16477) 2025-11-14 13:54:10 +01:00
llama.cpp llama-quant: add support for mmproj (#16592) 2025-10-15 14:48:08 +02:00
unicode-data.cpp server : better security control for public deployments (#9776) 2024-10-08 13:27:04 +02:00
unicode-data.h llama : reduce compile time and binary size (#9712) 2024-10-02 15:49:55 +02:00
unicode.cpp model : add AfmoeForCausalLM support (#16477) 2025-11-14 13:54:10 +01:00
unicode.h devops: add s390x & ppc64le CI (#15925) 2025-09-27 02:03:33 +08:00