llama.cpp

Commit Graph

Author	SHA1	Message	Date
ddh0	60235724cf	Merge branch 'ggml-org:master' into power-law-sampler	2025-12-17 22:07:22 -06:00
ddh0	775299892e	add `use_power_law` flag + logic, minor cleanup	2025-12-17 15:06:05 -06:00
Johannes Gäßler	8dcc3662a2	llama-fit-params: fix memory print (#18136 )	2025-12-17 21:10:03 +01:00
Georgi Gerganov	4301e27319	common : restore grammar-based rejection sampling (#18137 ) * common : restart grammar-based rejection sampling * sampling : allow null samplers	2025-12-17 19:46:00 +02:00
Tarek Dakhran	982060fadc	model: fix LFM2_MOE missing tensors (#18132 )	2025-12-17 12:17:11 +01:00
ddh0	27dda80dd7	Merge branch 'ggml-org:master' into power-law-sampler	2025-12-16 20:44:52 -06:00
Johannes Gäßler	d0794e89d9	llama-fit-params: force disable mlock (#18103 )	2025-12-17 00:50:12 +01:00
Johannes Gäßler	9dcac6cf9f	llama-fit-params: lower ctx size for multi GPU (#18101 )	2025-12-17 00:49:34 +01:00
Johannes Gäßler	0e49a7b8b4	llama-fit-params: fix underflow for dense models (#18095 )	2025-12-17 00:47:37 +01:00
ddh0	58aa1c6f5a	Merge branch 'ggml-org:master' into power-law-sampler	2025-12-16 13:33:03 -06:00
Xuan-Son Nguyen	ef83fb8601	model: fix LFM2 missing tensors (#18105 )	2025-12-16 19:07:43 +01:00
Johannes Gäßler	ec98e20021	llama: fix early stop in params_fit if ctx is set (#18070 )	2025-12-16 14:24:00 +01:00
Xuan-Son Nguyen	7f2b2f3c77	arch: refactor LLM_TENSOR_NAMES (#18051 ) * arch: refactor LLM_TENSOR_NAMES * update docs * typo * fix LLM_ARCH_NEMOTRON_H_MOE * show more meaningful error message on missing tensor * fix and tested LLM_ARCH_NEMOTRON_H_MOE	2025-12-16 13:22:30 +01:00
Piotr Wilkin (ilintar)	a5251ca11d	Optimization: Qwen3 next autoregressive pass (#17996 ) * It's Qwen3 Next, the lean mean token generation machine! * Apply patches from thread * Remove recurrent version, only keep chunked and autoregressive * Remove unnecessary conts and asserts * Remove more extra conts and asserts * Cleanup masking	2025-12-16 11:59:53 +01:00
Xuan-Son Nguyen	3d86c6c2b5	model: support GLM4V vision encoder (#18042 ) * convert ok * no deepstack * less new tensors * cgraph ok * add mrope for text model * faster patch merger * add GGML_ROPE_TYPE_MRNORM * add support for metal * move glm4v do dedicated graph * convert: add norm_embd * clip: add debugging fn * working correctly * fix style * use bicubic * fix mrope metal * improve cpu * convert to neox ordering on conversion * revert backend changes * force stop if using old weight * support moe variant * fix conversion * fix convert (2) * Update tools/mtmd/clip-graph.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * process mrope_section on TextModel base class * resolve conflict merge --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-16 11:25:26 +01:00
Chris Peterson	2aa45ef9e3	llama: Include algorithm header needed for C++23 (#18078 )	2025-12-16 09:37:55 +02:00
Georgi Gerganov	c560316440	graph : reuse SSM graphs (#16490 ) * graph : reuse hybrid graphs * graph : reuse recurrent graphs * graph : fix reuse check for recurrent inputs * memory : move the recurrent state into the memory context * Revert "memory : move the recurrent state into the memory context" This reverts commit 00f115fe810815d4a22a6dee0acc346131e970e1. * cont : fix build	2025-12-16 09:36:21 +02:00
Daniel Bevenius	2995341730	llama : add support for NVIDIA Nemotron 3 Nano (#18058 ) * llama : add support for NVIDIA Nemotron Nano 3 This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling the conversion and running of this model. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-16 07:19:26 +01:00
ddh0	fcb5129086	remove debug logging, explicitly clamp params at init	2025-12-15 21:42:29 -06:00
ddh0	85b6e52e39	Merge branch 'ggml-org:master' into power-law-sampler	2025-12-15 21:23:25 -06:00
ddh0	1c2d2e900d	simplify target computation last commit with debug logging!	2025-12-15 21:02:11 -06:00
HelloKS	9d52f17ae3	model : add KORMo model (#18032 ) * vocab: add KORMo Tokenizer * model: add KORMoForCausalLM * vocab: change pretokenizer to qwen2 * lint: fix unintended line removal * model: make qwen2 bias tensor optional * model: use qwen2 architecture for KORMo	2025-12-15 18:51:43 +01:00
ssweens	4529c660c8	kv-cache: Fix state restore fragmented cache (#17982 ) * kv-cache : fix state restore with fragmented cache (#17527) Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache. * tests : update logic * cleanup: tightened state_read_meta sig, added is_contiguous case * fix: state_read_meta arg reorder loose ends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-15 19:28:35 +02:00
ddh0	0344068cf1	remove extraneous logging	2025-12-15 09:35:44 -06:00
ddh0	9c50b573f5	improve logging messages in llama_sampler_power_law	2025-12-15 09:25:05 -06:00
ddh0	6e66095e1f	Merge branch 'ggml-org:master' into power-law-sampler	2025-12-15 09:07:13 -06:00
Johannes Gäßler	b1f3a6e5db	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 ) * llama: automatically fit args to free memory llama-fit-params tool * fix CI * hints for bug reports, ensure no reallocation * fix segfault with Vulkan * add llama-fit-params to CI * fix CI * fix CI * fix CI * minor adjustments * fix assignment of 1 dense layer * fix logger not being reset on model load failure * remove --n-gpu-layer hint on model load failure * fix llama-fit-params verbosity * fix edge case * fix typo [no ci]	2025-12-15 09:24:59 +01:00
ddh0	4e04bd1ce2	log sampler init values	2025-12-14 23:14:51 -06:00
ddh0	4e28eb2ffe	format (double)	2025-12-14 22:11:34 -06:00
ddh0	b5ed673ce9	fix logging	2025-12-14 22:08:36 -06:00
ddh0	493bf301ff	silence `missing initializer for member`	2025-12-14 21:55:45 -06:00
ddh0	6934780669	optimize	2025-12-14 16:26:15 -06:00
ddh0	36b526d768	Merge branch 'master' into power-law-sampler	2025-12-14 15:43:49 -06:00
Xuan-Son Nguyen	0759b09c90	graph: add f_attn_temp_offset (#18025 )	2025-12-14 13:05:59 +01:00
ddh0	ec54fe5f14	no, but does this?	2025-12-14 02:54:14 -06:00
ddh0	2a3f579d1f	does this fix it?	2025-12-14 01:55:02 -06:00
ddh0	9613c48172	with logging	2025-12-14 00:36:59 -06:00
Georgi Gerganov	609a2d0268	models : fix YaRN regression + consolidate logic (#18006 ) * models : fix YaRN regression + consolidate logic * cont : fix the fix * cont : remove header * cont : add header	2025-12-14 08:34:56 +02:00
ddh0	a96ddd743a	re-write + change parameters + simplify	2025-12-13 22:15:03 -06:00
ddh0	67a733670e	Merge branch 'ggml-org:master' into power-law-sampler	2025-12-13 17:27:35 -06:00
Jeff Bolz	5266379bca	llama_context: synchronize before reallocating output buffer (#17974 )	2025-12-13 09:19:51 -06:00
ddh0	1879fc6dc6	Merge branch 'ggml-org:master' into power-law-sampler	2025-12-13 01:17:53 -06:00
ddh0	824bb3aa6e	fix compiler warning, add commented-out logging per token	2025-12-13 00:23:15 -06:00
ddh0	0a19a3fd6c	remove old debug log, style nit	2025-12-12 23:45:45 -06:00
ddh0	94cb883ed9	copy from author ref: https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069	2025-12-12 23:19:08 -06:00
Georgi Gerganov	7bed317f53	models : fix the attn_factor for mistral3 graphs + improve consistency (#17945 ) * models : fix the attn_factor for mistral3 graphs * cont : rework attn_factor correction logic * cont : make deepseek2 consistent * cont : add TODO * cont : special-case DSv2 * cont : revert Mistral 3 Large changes * cont : fix DS2 to use the original attn_factor * cont : minor comments	2025-12-12 17:12:40 +02:00
ddh0	2d62bbea9f	remove `target_range` param, make `target == 1` no-op, cleanup code	2025-12-11 22:43:10 -06:00
ddh0	b3aea57768	minor	2025-12-11 16:48:52 -06:00
ddh0	93169593b8	remove old unused code from algorithm	2025-12-11 16:46:17 -06:00
ddh0	4959878a74	improved comments	2025-12-11 16:27:14 -06:00

1 2 3 4 5 ...

743 Commits