llama.cpp

History

ddh0 13f1e4a9ca llama : add adaptive-p sampler (#17927 ) * initial commit for branch * simplify constants * add params to `struct common_params_sampling`, add reference to PR * explicitly clamp `min_target` and `max_target` to `[0.0, 1.0]` * add args, rename `queue_size` -> `window_size` * improved comments * minor * remove old unused code from algorithm * minor * add power law case to `common_sampler_init`, add sampler name mappings * clarify behaviour when `window_size = 0` * add missing enums * remove `target_range` param, make `target == 1` no-op, cleanup code * oops, straggler * add missing parameters in `server-task.cpp` * copy from author ref: https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069 * remove old debug log, style nit * fix compiler warning, add commented-out logging per token * re-write + change parameters + simplify * oops forgot args.cpp * fix leftover `window_size` * add missing values to `common_params_sampling::print()` * with logging * does this fix it? * no, but does this? * update default decay * optimize * fix bad merge my git skills are lacking * silence `missing initializer for member` * update default decay to 0.9 * fix logging * format (double) * add power law to the new `samplers` vector * log sampler init values * improve logging messages in llama_sampler_power_law * remove extraneous logging * simplify target computation last commit with debug logging! * remove debug logging, explicitly clamp params at init * add `use_power_law` flag + logic, minor cleanup * update `power-law` -> `adaptive-p` * fix cold start EMA - `ctx->weighted_sum` is now initialized and reset to `target / (1.0f - clamped_decay)` - `ctx->total_weight` is now initialized and reset to `1.0f / (1.0f - clamped_decay)` this fixes a "cold start" problem with the moving average * update `SHARPNESS` constant to `10.0f` * minor style fixes no functional changes * minor style fixes cont. * update `llama_sampler_adaptive_p_i` for backend sampling (ref: #17004) * separate into `apply` + `accept` functions * `pending_token_idx`: switch from `llama_token` to `int32` functionally identical (`llama.h` has `typedef int32_t llama_token;`), but its more correct now * don't transform logits <= -1e9f * fix masking in backend top-p, min-p * address review comments * typo in comments `RND` -> `RNG` * add docs * add recommended values in completion docs * address PR feedback * remove trailing whitespace (for CI `editorconfig`) * add to adaptive-p to `common_sampler_types_from_chars`		2026-01-15 19:16:29 +02:00
..
CMakeLists.txt	Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914 )	2026-01-14 20:29:35 +01:00
arg.cpp	llama : add adaptive-p sampler (#17927 )	2026-01-15 19:16:29 +02:00
arg.h	vendor : update cpp-httplib to 0.30.0 (#18660 )	2026-01-08 13:53:54 +01:00
base64.hpp	llava : expose as a shared library for downstream projects (#3613 )	2023-11-07 00:36:23 +03:00
build-info.cpp.in	cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 )	2025-06-13 10:38:52 +02:00
chat-parser-xml-toolcall.cpp	Fix Kimi-K2 tool-call parsing issues (#17376 )	2025-12-08 14:32:04 +01:00
chat-parser-xml-toolcall.h	Fix Kimi-K2 tool-call parsing issues (#17376 )	2025-12-08 14:32:04 +01:00
chat-parser.cpp	model : add EXAONE MoE (#18543 )	2026-01-13 23:28:38 +01:00
chat-parser.h	common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )	2025-11-18 18:54:15 +01:00
chat-peg-parser.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
chat-peg-parser.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
chat.cpp	model : add EXAONE MoE (#18543 )	2026-01-13 23:28:38 +01:00
chat.h	model : add EXAONE MoE (#18543 )	2026-01-13 23:28:38 +01:00
common.cpp	context : reserve new scheduler when graph topology changes (#18547 )	2026-01-15 16:39:17 +02:00
common.h	llama : add adaptive-p sampler (#17927 )	2026-01-15 19:16:29 +02:00
console.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
console.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
debug.cpp	Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914 )	2026-01-14 20:29:35 +01:00
debug.h	Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914 )	2026-01-14 20:29:35 +01:00
download.cpp	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
download.h	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
http.h	common: introduce http.h for httplib-based client (#16373 )	2025-10-01 20:22:18 +03:00
json-partial.cpp	common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )	2025-11-18 18:54:15 +01:00
json-partial.h	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
json-schema-to-grammar.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
json-schema-to-grammar.h	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
llguidance.cpp	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00
log.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
log.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
ngram-cache.cpp	ggml : portability fixes for VS 2017 (#12150 )	2025-03-04 18:53:26 +02:00
ngram-cache.h	llama : use LLAMA_TOKEN_NULL (#11062 )	2025-01-06 10:52:15 +02:00
peg-parser.cpp	common : add nemotron 3 parsing (#18077 )	2025-12-16 04:05:23 -06:00
peg-parser.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
preset.cpp	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
preset.h	common: support remote preset (#18520 )	2026-01-08 22:35:40 +01:00
regex-partial.cpp	common/grammar : replace problematic backtracking regex `[\s\S]*` (#18342 )	2026-01-03 16:02:43 -06:00
regex-partial.h	`common`: add partial regex support (#12808 )	2025-05-14 19:50:57 +01:00
sampling.cpp	llama : add adaptive-p sampler (#17927 )	2026-01-15 19:16:29 +02:00
sampling.h	sampling : add support for backend sampling (#17004 )	2026-01-04 22:22:16 +02:00
speculative.cpp	common : restore grammar-based rejection sampling (#18137 )	2025-12-17 19:46:00 +02:00
speculative.h	server : implement universal assisted decoding (#12635 )	2025-07-31 14:25:23 +02:00
unicode.cpp	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
unicode.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00