llama.cpp/common
Oliver Simons 1750917420 Fix different RNG-states between backend-sampling and llama-sampling
By default, we perform a warm-up step where the ggml_cgraph is computed
once. For backend-sampling, this graph contains the sampler, and thus
the RNG state of the backend's dist sampler is advanced once.

Solution to this is to reset the samplers after the warmup has finished
2025-12-19 11:42:10 +01:00
..
CMakeLists.txt server: add presets (config) when using multiple models (#17859) 2025-12-10 22:18:21 +01:00
arg.cpp Merge remote-tracking branch 'upstream/master' into backend-sampling 2025-12-19 09:38:01 +01:00
arg.h arg: fix common_params_parse not accepting negated arg (#17991) 2025-12-13 12:53:37 +01:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) 2025-06-13 10:38:52 +02:00
chat-parser-xml-toolcall.cpp Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser-xml-toolcall.h Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser.cpp Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser.h common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
chat-peg-parser.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
chat-peg-parser.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
chat.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
chat.h chat : reserve memory in compute_diffs and improve naming (#17729) 2025-12-03 17:22:10 +02:00
common.cpp Fix different RNG-states between backend-sampling and llama-sampling 2025-12-19 11:42:10 +01:00
common.h Fix different RNG-states between backend-sampling and llama-sampling 2025-12-19 11:42:10 +01:00
console.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
console.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
download.cpp common : add minimalist multi-thread progress bar (#17602) 2025-12-12 12:44:35 +01:00
download.h server: introduce API for serving / loading / unloading multiple models (#17470) 2025-12-01 19:41:04 +01:00
http.h common: introduce http.h for httplib-based client (#16373) 2025-10-01 20:22:18 +03:00
json-partial.cpp common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
json-partial.h sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
json-schema-to-grammar.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
json-schema-to-grammar.h common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
llguidance.cpp cont : naming 2025-11-30 11:24:30 +02:00
log.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
log.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
ngram-cache.cpp ggml : portability fixes for VS 2017 (#12150) 2025-03-04 18:53:26 +02:00
ngram-cache.h llama : use LLAMA_TOKEN_NULL (#11062) 2025-01-06 10:52:15 +02:00
peg-parser.cpp common : add nemotron 3 parsing (#18077) 2025-12-16 04:05:23 -06:00
peg-parser.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
preset.cpp preset: handle negated arg, reverse the meaning if needed (#18041) 2025-12-14 22:08:10 +01:00
preset.h server: add presets (config) when using multiple models (#17859) 2025-12-10 22:18:21 +01:00
regex-partial.cpp `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
regex-partial.h `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
sampling.cpp common : disable backend sampling when grammar is involved 2025-12-18 10:52:21 +02:00
sampling.h common : disable backend sampling when grammar is involved 2025-12-18 10:52:21 +02:00
speculative.cpp common : restore grammar-based rejection sampling (#18137) 2025-12-17 19:46:00 +02:00
speculative.h server : implement universal assisted decoding (#12635) 2025-07-31 14:25:23 +02:00
unicode.cpp common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
unicode.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00