llama.cpp

Commit Graph

Author	SHA1	Message	Date
ddh0	521a13e6c6	correct fallback logic	2026-02-16 13:14:05 -06:00
ddh0	aaf010edeb	new function `llama_tensor_update_stats`	2026-02-16 12:20:16 -06:00
ddh0	0c976fafd7	Merge branch 'ggml-org:master' into llama-quant-refactor	2026-02-16 11:00:49 -06:00
Saurabh Dash	5f28c53d11	model: Add support for Tiny Aya Models (#19611 ) * changes for tiny aya * changes to hash * changes to vocab * fix some tokenizer regex edge cases * update comment * add some comments for regex * Apply suggestion from @ngxson --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-02-16 16:28:46 +01:00
Georgi Gerganov	cc45f2ada6	models : deduplicate delta-net graphs for Qwen family (#19597 ) * models : add llm_build_delta_net_base * cont : keep qwen35 and qwen35moe graphs intact * cont : add comments	2026-02-16 14:35:04 +02:00
Georgi Gerganov	d5dfc33027	graph : fix KQ mask, lora, cvec reuse checks (#19644 ) * graph : fix KQ mask reuse condition * cont : dedup KQ mask build and can_reuse * cont : fix build * graph : fix adapter check for reuse	2026-02-16 09:21:11 +02:00
Georgi Gerganov	341bc7d23c	context : fix output reorder with backend sampling (#19638 )	2026-02-15 14:57:40 +02:00
ddh0	f14fd0c7f2	Merge branch 'ggml-org:master' into llama-quant-refactor	2026-02-14 22:36:18 -06:00
Georgi Gerganov	1725e316c1	models : optimize qwen3next graph (#19375 ) * models : optimizing qwen3next graph * cont * wip * wip * wip * wip * wip * wip * wip * wip * wip * wip * cont : remove redundant q, g chunking * minor * minor * avoid passing masks around * avoid concats during chunking * naming + shapes * update names and use prefix to disable CUDA graphs	2026-02-14 12:57:36 +02:00
agent-enemy-2	2d8015e8a4	llama : update LoRA API. + fix excessive graph reserves (#19280 ) * Refactoring to use new llama_put_adapter_loras * cont : alternative lora API --------- Co-authored-by: Jake Chavis <jakechavis6@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-02-14 10:06:27 +02:00
George	eb145c0753	mmap: Fix Windows handle lifetime (#19598 ) * ggml: added cleanups in ggml_quantize_free Add missing cleanup calls for IQ2_S, IQ1_M quantization types and IQ3XS with 512 blocks during quantization cleanup. * mmap: Fix Windows handle lifetime Move hMapping from local variable to member variable so it stays alive for the entire lifetime of the mapping. The file mapping handle must remain valid until UnmapViewOfFile is called. Fixes cleanup order in destructor. * Update llama-mmap.cpp * Update llama-mmap.cpp Remove trailing whitespace from line 567	2026-02-14 10:05:12 +02:00
ddh0	a3bf07ea05	Merge branch 'ggml-org:master' into llama-quant-refactor	2026-02-13 21:29:21 -06:00
ddh0	7b127e126a	correct function names	2026-02-13 21:17:53 -06:00
ddh0	bddc67547f	correct function names	2026-02-13 21:13:53 -06:00
Xuan-Son Nguyen	752584d5f5	model: support GLM MoE DSA arch (NOTE: indexer is not yet supported) (#19460 ) * model: support GLM MoE DSA arch * working version * pyright * keep indexer tensors * add indexer gguf params * loaded now * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * update * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * minor fix and cleanup --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-13 14:56:53 +01:00
ymcki	33a56f90a6	model : Kimi Linear fix conv state update (#19531 ) * fix conv state update for llama-server parallel serving --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>	2026-02-13 09:10:18 +01:00
Adrien Gallouët	25224c8021	llama : remove deprecated codecvt (#19565 ) Using the same conversion function ensures a consistent matching between the regex pattern and the text. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-13 06:43:53 +01:00
Georgi Gerganov	bb96bfd361	memory : fix kv cache size for hybrid models (#19559 )	2026-02-13 07:36:24 +02:00
ddh0	97aefac773	update_stats guard	2026-02-12 20:00:23 -06:00
ddh0	053a28980b	don't double-count `qs`	2026-02-12 18:31:59 -06:00
ddh0	fd3787ee05	typo	2026-02-12 18:24:47 -06:00
ddh0	d648629f56	remove unused `std::vector<ggml_tensor*> tensors;`	2026-02-12 18:24:16 -06:00
ddh0	6734e77662	don't throw by pointer; unify MiB formatting	2026-02-12 18:22:52 -06:00
ddh0	1f25c130de	pretty error msg	2026-02-12 18:11:44 -06:00
ddh0	67e25bbae1	fix compile errors	2026-02-12 18:02:40 -06:00
ddh0	5d6c92440c	initial commit for branch	2026-02-12 17:52:59 -06:00
ddh0	f58de63ec3	remove unused `params` parameter	2026-02-11 22:30:06 -06:00
ddh0	44f9fee248	remove per @compilade	2026-02-11 22:23:10 -06:00
ddh0	40528248fc	comment ref #12557	2026-02-11 22:18:56 -06:00
ddh0	1658228d6a	add back Q2_K edge case for imatrix	2026-02-11 21:53:07 -06:00
ddh0	1ccd7a49ba	simplify for style	2026-02-11 21:41:37 -06:00
ddh0	ae786b862d	simplify and rename `tensor_type_requires_imatrix`	2026-02-11 21:21:40 -06:00
ddh0	22db76409b	add missing `GGML_TYPE`s	2026-02-11 21:14:19 -06:00
ddh0	55dbee2bbe	fixup tensor_requires_imatrix	2026-02-11 21:03:34 -06:00
ddh0	3211a847ef	logic error	2026-02-11 20:58:52 -06:00
ddh0	ea8da0503c	missing __func__, move imatrix flag set	2026-02-11 20:57:16 -06:00
ddh0	2769f35207	new function `tensor_requires_imatrix`, add courtesy warning about imatrix	2026-02-11 20:49:05 -06:00
ddh0	966b21a981	show model and quant BPW when quant completes	2026-02-11 15:30:12 -06:00
ddh0	b9b32f0d2d	no need to re-calculate ggml_nbytes for tensor	2026-02-11 14:45:44 -06:00
ddh0	c3f42dedd1	use 6 characters for tensor dims (cont.)	2026-02-11 14:29:22 -06:00
ddh0	56c27b13ad	add --dry-run to llama-quantize	2026-02-11 14:08:17 -06:00
ddh0	0d22288f00	use 6 characters for tensor dims	2026-02-11 14:08:01 -06:00
ddh0	844ad3e326	clean slate for branch	2026-02-11 12:47:13 -06:00
Georgi Gerganov	6d95707827	model : fix wavtokenizer embedding notions (#19479 )	2026-02-11 07:52:20 +02:00
Daniel Bevenius	2cce9fddb7	llama : refactor sampling_info to use buffer_view template (#19368 ) * llama : refactor sampling_info to use buffer_view template This commit updates the sampling_info struct in llama-context to use a buffer_view template for the logits, probs, sampled tokens, and candidates buffers. The motivation for this is to simplify the code, improve type safety and readability.	2026-02-11 05:38:13 +01:00
JJJYmmm	fc0fe40049	models : support qwen3.5 series (#19468 ) * support qwen3.5 series * remove deepstack for now, and some code clean * code clean * add FULL_ATTENTION_INTERVAL metadata * code clean * reorder v heads for linear attention to avoid expensive interleaved repeat	2026-02-10 18:00:26 +02:00
Georgi Gerganov	972f323e73	revert : "[Model] Qwen3.5 dense and MoE support (no vision) (#19435 )" (#19453 ) This reverts commit `39bf692af1`.	2026-02-09 14:57:51 +02:00
Piotr Wilkin (ilintar)	39bf692af1	[Model] Qwen3.5 dense and MoE support (no vision) (#19435 ) * Unified delta net handling * Remove old methods. * Refactor and optimize * Adapt autoregressive version from @ymcki * Change to decay mask approach * Fix bad permute * Qwen 3.5 support * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Further fixes * Use inheritance, remove unneeded conts * Not like this! * Remove ggml.h explicit import * Remove transformers, fix the views * ACTUALLY fix views, make super calls explicit in conversion. * Fix conversion again * Remove extra ggml.h imports --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-09 00:24:08 +01:00
forforever73	b83111815e	model : support Step3.5-Flash (#19283 ) * Support Step3.5-Flash * fix: norm.weight + 1 (HF zero_centered=true) * step35: simplify GGUF conversion + drop redundant rope KVs * Address review feedback * rename limits -> clamp * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Apply suggestion from @CISC Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * rename swiglu limits -> swiglu clamp in LLM_KV * avoid CI fail * Apply suggestions from code review * Apply suggestions from code review * disabled KV shifting for LLM_ARCH_STEP35 * Apply suggestions from code review * mistakenly removed cmath * add model size && apply missed suggestion * assert partial_rotary_factors * fix CI errors: * load freq_base_swa --------- Co-authored-by: lvyichen <lvyichen@stepfun.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-06 21:06:14 +01:00
Lasse Lauwerys	06bf3796f4	unicode : MSVC regex fix (#19340 ) * Fix model loading regex error * Change comments * Use const_iterator and remove specializations --------- Co-authored-by: Alde Rojas <hello@alde.dev>	2026-02-06 15:56:13 +02:00

1 2 3 4 5 ...

840 Commits