Commit Graph

71 Commits

Author SHA1 Message Date
ddh0 080749909e minor style fixes
no functional changes
2025-12-30 15:47:14 -06:00
ddh0 c6a6f6354b update `SHARPNESS` constant to `10.0f` 2025-12-30 13:49:06 -06:00
ddh0 2d67b1c008
Merge branch 'ggml-org:master' into power-law-sampler 2025-12-30 13:44:42 -06:00
Jay Zenith c32fa21db8
sampling: reuse token data buffer in llama_sampler_sample (#18365)
* sampling: reuse token data buffer in llama_sampler_sample

* move cur buffer before timing section, after samplers

* minor : fix build

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-30 16:27:49 +02:00
ddh0 e7a892065d fix cold start EMA
- `ctx->weighted_sum` is now initialized and reset to `target / (1.0f -
clamped_decay)`
- `ctx->total_weight` is now initialized and reset to `1.0f / (1.0f -
clamped_decay)`

this fixes a "cold start" problem with the moving average
2025-12-28 20:33:09 -06:00
ddh0 b95b0884dd update `power-law` -> `adaptive-p` 2025-12-27 02:10:20 -06:00
ddh0 60235724cf
Merge branch 'ggml-org:master' into power-law-sampler 2025-12-17 22:07:22 -06:00
ddh0 775299892e add `use_power_law` flag + logic, minor cleanup 2025-12-17 15:06:05 -06:00
Georgi Gerganov 4301e27319
common : restore grammar-based rejection sampling (#18137)
* common : restart grammar-based rejection sampling

* sampling : allow null samplers
2025-12-17 19:46:00 +02:00
ddh0 fcb5129086 remove debug logging, explicitly clamp params at init 2025-12-15 21:42:29 -06:00
ddh0 1c2d2e900d simplify target computation
last commit with debug logging!
2025-12-15 21:02:11 -06:00
ddh0 0344068cf1
remove extraneous logging 2025-12-15 09:35:44 -06:00
ddh0 9c50b573f5
improve logging messages in llama_sampler_power_law 2025-12-15 09:25:05 -06:00
ddh0 4e04bd1ce2 log sampler init values 2025-12-14 23:14:51 -06:00
ddh0 4e28eb2ffe format (double) 2025-12-14 22:11:34 -06:00
ddh0 b5ed673ce9 fix logging 2025-12-14 22:08:36 -06:00
ddh0 493bf301ff silence `missing initializer for member` 2025-12-14 21:55:45 -06:00
ddh0 6934780669 optimize 2025-12-14 16:26:15 -06:00
ddh0 ec54fe5f14 no, but does this? 2025-12-14 02:54:14 -06:00
ddh0 2a3f579d1f does this fix it? 2025-12-14 01:55:02 -06:00
ddh0 9613c48172 with logging 2025-12-14 00:36:59 -06:00
ddh0 a96ddd743a re-write + change parameters + simplify 2025-12-13 22:15:03 -06:00
ddh0 824bb3aa6e fix compiler warning, add commented-out logging per token 2025-12-13 00:23:15 -06:00
ddh0 0a19a3fd6c remove old debug log, style nit 2025-12-12 23:45:45 -06:00
ddh0 94cb883ed9 copy from author
ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069
2025-12-12 23:19:08 -06:00
ddh0 2d62bbea9f remove `target_range` param, make `target == 1` no-op, cleanup code 2025-12-11 22:43:10 -06:00
ddh0 b3aea57768 minor 2025-12-11 16:48:52 -06:00
ddh0 93169593b8 remove old unused code from algorithm 2025-12-11 16:46:17 -06:00
ddh0 4959878a74 improved comments 2025-12-11 16:27:14 -06:00
ddh0 ffe163911b add args, rename `queue_size` -> `window_size` 2025-12-11 15:16:11 -06:00
ddh0 374bfd4363 explicitly clamp `min_target` and `max_target` to `[0.0, 1.0]` 2025-12-11 14:22:58 -06:00
ddh0 88fb0f3f32 add params to `struct common_params_sampling`, add reference to PR 2025-12-11 13:47:51 -06:00
ddh0 5ab4ff7e44 simplify constants 2025-12-10 22:30:14 -06:00
ddh0 774cf23ee5 initial commit for branch 2025-12-10 22:13:58 -06:00
Georgi Gerganov 196f5083ef
common : more accurate sampling timing (#17382)
* common : more accurate sampling timing

* eval-callback : minor fixes

* cont : add time_meas impl

* cont : fix log msg [no ci]

* cont : fix multiple definitions of time_meas

* llama-cli : exclude chat template init from time measurement

* cont : print percentage of unaccounted time

* cont : do not reset timings
2025-11-20 13:40:10 +02:00
Marek Hradil jr. 6cd0cf72ce
fix : Dangling pointer for non-empty trigger words in lazy grammar construction (#17048)
* fix : Dangling pointer for non-empty trigger words in llama_sampler_init_grammar_impl (#17047)

* Replace 'static' workaround, with keeping variable in scope for longer

* Create std::array directly and pass into llama_grammar_init_impl

* Add back the trigger pattern

* Missed array include
2025-11-14 14:35:26 +02:00
Georgi Gerganov 81086cd6a3
vocab : mark EOT token for Granite models (#16499)
* vocab : mark EOT token for Granite models

* sampling : fallback to EOS when EOT is not found
2025-10-10 17:17:31 +03:00
Georgi Gerganov cdedb70a99
sampling : optimize dist sampler (#15704)
ggml-ci
2025-09-03 18:16:26 +03:00
Georgi Gerganov e92d53b29e
sampling : optimize samplers by reusing bucket sort (#15665)
* sampling : optimize sorting using bucket sort in more places

ggml-ci

* sampling : do not sort in dist sampler

ggml-ci

* sampling : avoid heap allocations for sort buffers

ggml-ci

* common : add option to sort sampling candidates by probability

ggml-ci

* sampling : revert the change for preserving sort buffers

* sampling : use std::copy instead of memcpy

* sampling : clarify purpose of partial sort helpers

ggml-ci

* cont : remove wrong comment [no ci]

* common : update comment

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-08-31 20:41:02 +03:00
Georgi Gerganov f9cd68398b
sampling : make sure samplers return at least 1 token (#13822)
* sampling : min-p should always return at least one token

ggml-ci

* sampling : same for typical sampling

* tests : sampling tests use min_keep == 0

ggml-ci
2025-05-27 12:07:52 +03:00
DocShotgun ffc727203a
sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345) 2025-05-06 22:36:24 +02:00
oobabooga 91a86a6f35
sampling : don't consider -infinity values in top_n_sigma (#13344) 2025-05-06 20:24:15 +02:00
oobabooga 233461f812
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264)
* sampling: add Top-nσ sampler to `llama-server` and sampler ordering

* revert: sampler ordering

* revert: VS' crappy auto-formatting

* revert: VS' crappy auto-formatting pt.2

* revert: my crappy eye sight...

* sampling: add XTC to Top-nσ sampler chain

* sampling: add Dyna. Temp. to Top-nσ sampler chain

* sampling: actually remove Top-nσ from sampler(oops)

* Integrate top_n_sigma into main sampler chain

* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA

* Formatting

* Lint

* Exit early in the sampler if nsigma < 0

---------

Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
2025-05-05 22:12:19 +02:00
Georgi Gerganov d9d398f84f
sampling : when top-k <= 0 -> noop (#13173)
ggml-ci
2025-04-29 20:22:57 +03:00
Johannes Gäßler dd373dd3bf
llama: fix error on bad grammar (#12628) 2025-03-28 18:08:52 +01:00
Olivier Chafik 669912d9a5
`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034)
* sampler: turn lazy grammar trigger words to regexes

* add scripts/tool_bench.sh & .py

* constrain llama json output regardless of function name if matches at beginning

* update relaxed newline space rule in grammar tests

* support add_generation_prompt query parameter (useful for /apply_template)

* Update src/llama-grammar.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-03-05 13:05:13 +00:00
Vinesh Janarthanan 27e8a23300
sampling: add Top-nσ sampler (#11223)
* initial sampling changes:

* completed top nsigma sampler implementation

* apply parameter to only llama-cli

* updated readme

* added tests and fixed nsigma impl

* cleaned up pr

* format

* format

* format

* removed commented tests

* cleanup pr and remove explicit floats

* added top-k sampler to improve performance

* changed sigma to float

* fixed string format to float

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update common/sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Update src/llama-sampling.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* added llama_sampler_init

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-02-13 08:45:57 +02:00
Christian Fillion 7ee953a64a
llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727)
The C API in llama.h claims users can implement `llama_sampler_i` to
create custom `llama_sampler`. The sampler chain takes ownership and
calls `llama_sampler_free` on them. However, `llama_sampler_free` is
hard-coded to use `delete`. This is undefined behavior if the object
wasn't also allocated via `new` from libllama's C++ runtime. Callers
in C and C-compatible languages do not use C++'s `new` operator. C++
callers may not be sharing the same heap as libllama.
2025-02-07 11:33:27 +02:00
Olivier Chafik 8b576b6c55
Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)
---------

Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-01-30 19:13:58 +00:00
Georgi Gerganov afa8a9ec9b
llama : add `llama_vocab`, functions -> methods, naming (#11110)
* llama : functions -> methods (#11110)

* llama : add struct llama_vocab to the API (#11156)

ggml-ci

* hparams : move vocab params to llama_vocab (#11159)

ggml-ci

* vocab : more pimpl (#11165)

ggml-ci

* vocab : minor tokenization optimizations (#11160)

ggml-ci

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* lora : update API names (#11167)

ggml-ci

* llama : update API names to use correct prefix (#11174)

* llama : update API names to use correct prefix

ggml-ci

* cont

ggml-ci

* cont

ggml-ci

* minor [no ci]

* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174)

ggml-ci

* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174)

ggml-ci

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-12 11:32:42 +02:00