Commit Graph

743 Commits

Author SHA1 Message Date
ddh0 60235724cf
Merge branch 'ggml-org:master' into power-law-sampler 2025-12-17 22:07:22 -06:00
ddh0 775299892e add `use_power_law` flag + logic, minor cleanup 2025-12-17 15:06:05 -06:00
Johannes Gäßler 8dcc3662a2
llama-fit-params: fix memory print (#18136) 2025-12-17 21:10:03 +01:00
Georgi Gerganov 4301e27319
common : restore grammar-based rejection sampling (#18137)
* common : restart grammar-based rejection sampling

* sampling : allow null samplers
2025-12-17 19:46:00 +02:00
Tarek Dakhran 982060fadc
model: fix LFM2_MOE missing tensors (#18132) 2025-12-17 12:17:11 +01:00
ddh0 27dda80dd7
Merge branch 'ggml-org:master' into power-law-sampler 2025-12-16 20:44:52 -06:00
Johannes Gäßler d0794e89d9
llama-fit-params: force disable mlock (#18103) 2025-12-17 00:50:12 +01:00
Johannes Gäßler 9dcac6cf9f
llama-fit-params: lower ctx size for multi GPU (#18101) 2025-12-17 00:49:34 +01:00
Johannes Gäßler 0e49a7b8b4
llama-fit-params: fix underflow for dense models (#18095) 2025-12-17 00:47:37 +01:00
ddh0 58aa1c6f5a
Merge branch 'ggml-org:master' into power-law-sampler 2025-12-16 13:33:03 -06:00
Xuan-Son Nguyen ef83fb8601
model: fix LFM2 missing tensors (#18105) 2025-12-16 19:07:43 +01:00
Johannes Gäßler ec98e20021
llama: fix early stop in params_fit if ctx is set (#18070) 2025-12-16 14:24:00 +01:00
Xuan-Son Nguyen 7f2b2f3c77
arch: refactor LLM_TENSOR_NAMES (#18051)
* arch: refactor LLM_TENSOR_NAMES

* update docs

* typo

* fix LLM_ARCH_NEMOTRON_H_MOE

* show more meaningful error message on missing tensor

* fix and tested LLM_ARCH_NEMOTRON_H_MOE
2025-12-16 13:22:30 +01:00
Piotr Wilkin (ilintar) a5251ca11d
Optimization: Qwen3 next autoregressive pass (#17996)
* It's Qwen3 Next, the lean mean token generation machine!

* Apply patches from thread

* Remove recurrent version, only keep chunked and autoregressive

* Remove unnecessary conts and asserts

* Remove more extra conts and asserts

* Cleanup masking
2025-12-16 11:59:53 +01:00
Xuan-Son Nguyen 3d86c6c2b5
model: support GLM4V vision encoder (#18042)
* convert ok

* no deepstack

* less new tensors

* cgraph ok

* add mrope for text model

* faster patch merger

* add GGML_ROPE_TYPE_MRNORM

* add support for metal

* move glm4v do dedicated graph

* convert: add norm_embd

* clip: add debugging fn

* working correctly

* fix style

* use bicubic

* fix mrope metal

* improve cpu

* convert to neox ordering on conversion

* revert backend changes

* force stop if using old weight

* support moe variant

* fix conversion

* fix convert (2)

* Update tools/mtmd/clip-graph.h

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* process mrope_section on TextModel base class

* resolve conflict merge

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-16 11:25:26 +01:00
Chris Peterson 2aa45ef9e3
llama: Include algorithm header needed for C++23 (#18078) 2025-12-16 09:37:55 +02:00
Georgi Gerganov c560316440
graph : reuse SSM graphs (#16490)
* graph : reuse hybrid graphs

* graph : reuse recurrent graphs

* graph : fix reuse check for recurrent inputs

* memory : move the recurrent state into the memory context

* Revert "memory : move the recurrent state into the memory context"

This reverts commit 00f115fe810815d4a22a6dee0acc346131e970e1.

* cont : fix build
2025-12-16 09:36:21 +02:00
Daniel Bevenius 2995341730
llama : add support for NVIDIA Nemotron 3 Nano (#18058)
* llama : add support for NVIDIA Nemotron Nano 3

This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling
the conversion and running of this model.

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-16 07:19:26 +01:00
ddh0 fcb5129086 remove debug logging, explicitly clamp params at init 2025-12-15 21:42:29 -06:00
ddh0 85b6e52e39
Merge branch 'ggml-org:master' into power-law-sampler 2025-12-15 21:23:25 -06:00
ddh0 1c2d2e900d simplify target computation
last commit with debug logging!
2025-12-15 21:02:11 -06:00
HelloKS 9d52f17ae3
model : add KORMo model (#18032)
* vocab: add KORMo Tokenizer

* model: add KORMoForCausalLM

* vocab: change pretokenizer to qwen2

* lint: fix unintended line removal

* model: make qwen2 bias tensor optional

* model: use qwen2 architecture for KORMo
2025-12-15 18:51:43 +01:00
ssweens 4529c660c8
kv-cache: Fix state restore fragmented cache (#17982)
* kv-cache : fix state restore with fragmented cache (#17527)

Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache.

* tests : update logic

* cleanup: tightened state_read_meta sig, added is_contiguous case

* fix: state_read_meta arg reorder loose ends

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-15 19:28:35 +02:00
ddh0 0344068cf1
remove extraneous logging 2025-12-15 09:35:44 -06:00
ddh0 9c50b573f5
improve logging messages in llama_sampler_power_law 2025-12-15 09:25:05 -06:00
ddh0 6e66095e1f
Merge branch 'ggml-org:master' into power-law-sampler 2025-12-15 09:07:13 -06:00
Johannes Gäßler b1f3a6e5db
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653)
* llama: automatically fit args to free memory

llama-fit-params tool

* fix CI

* hints for bug reports, ensure no reallocation

* fix segfault with Vulkan

* add llama-fit-params to CI

* fix CI

* fix CI

* fix CI

* minor adjustments

* fix assignment of 1 dense layer

* fix logger not being reset on model load failure

* remove --n-gpu-layer hint on model load failure

* fix llama-fit-params verbosity

* fix edge case

* fix typo [no ci]
2025-12-15 09:24:59 +01:00
ddh0 4e04bd1ce2 log sampler init values 2025-12-14 23:14:51 -06:00
ddh0 4e28eb2ffe format (double) 2025-12-14 22:11:34 -06:00
ddh0 b5ed673ce9 fix logging 2025-12-14 22:08:36 -06:00
ddh0 493bf301ff silence `missing initializer for member` 2025-12-14 21:55:45 -06:00
ddh0 6934780669 optimize 2025-12-14 16:26:15 -06:00
ddh0 36b526d768
Merge branch 'master' into power-law-sampler 2025-12-14 15:43:49 -06:00
Xuan-Son Nguyen 0759b09c90
graph: add f_attn_temp_offset (#18025) 2025-12-14 13:05:59 +01:00
ddh0 ec54fe5f14 no, but does this? 2025-12-14 02:54:14 -06:00
ddh0 2a3f579d1f does this fix it? 2025-12-14 01:55:02 -06:00
ddh0 9613c48172 with logging 2025-12-14 00:36:59 -06:00
Georgi Gerganov 609a2d0268
models : fix YaRN regression + consolidate logic (#18006)
* models : fix YaRN regression + consolidate logic

* cont : fix the fix

* cont : remove header

* cont : add header
2025-12-14 08:34:56 +02:00
ddh0 a96ddd743a re-write + change parameters + simplify 2025-12-13 22:15:03 -06:00
ddh0 67a733670e
Merge branch 'ggml-org:master' into power-law-sampler 2025-12-13 17:27:35 -06:00
Jeff Bolz 5266379bca
llama_context: synchronize before reallocating output buffer (#17974) 2025-12-13 09:19:51 -06:00
ddh0 1879fc6dc6
Merge branch 'ggml-org:master' into power-law-sampler 2025-12-13 01:17:53 -06:00
ddh0 824bb3aa6e fix compiler warning, add commented-out logging per token 2025-12-13 00:23:15 -06:00
ddh0 0a19a3fd6c remove old debug log, style nit 2025-12-12 23:45:45 -06:00
ddh0 94cb883ed9 copy from author
ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069
2025-12-12 23:19:08 -06:00
Georgi Gerganov 7bed317f53
models : fix the attn_factor for mistral3 graphs + improve consistency (#17945)
* models : fix the attn_factor for mistral3 graphs

* cont : rework attn_factor correction logic

* cont : make deepseek2 consistent

* cont : add TODO

* cont : special-case DSv2

* cont : revert Mistral 3 Large changes

* cont : fix DS2 to use the original attn_factor

* cont : minor comments
2025-12-12 17:12:40 +02:00
ddh0 2d62bbea9f remove `target_range` param, make `target == 1` no-op, cleanup code 2025-12-11 22:43:10 -06:00
ddh0 b3aea57768 minor 2025-12-11 16:48:52 -06:00
ddh0 93169593b8 remove old unused code from algorithm 2025-12-11 16:46:17 -06:00
ddh0 4959878a74 improved comments 2025-12-11 16:27:14 -06:00