llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	960ef96141	Prepare for future optimization algorithms	2026-01-01 13:44:59 +00:00
Ed Addario	91846ee79b	Change checkpoint file magic	2025-12-29 13:02:06 +00:00
Ed Addario	b6d718a4a6	Add code comments	2025-12-25 15:47:44 +00:00
Ed Addario	5f7bba7828	Improve state checkpoint filename	2025-12-25 15:47:18 +00:00
Ed Addario	dfa79a9484	Merge branch 'master' into quantize	2025-12-16 13:57:54 +01:00
Johannes Gäßler	b1f3a6e5db	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 ) * llama: automatically fit args to free memory llama-fit-params tool * fix CI * hints for bug reports, ensure no reallocation * fix segfault with Vulkan * add llama-fit-params to CI * fix CI * fix CI * fix CI * minor adjustments * fix assignment of 1 dense layer * fix logger not being reset on model load failure * remove --n-gpu-layer hint on model load failure * fix llama-fit-params verbosity * fix edge case * fix typo [no ci]	2025-12-15 09:24:59 +01:00
Ed Addario	e3d9b340ca	Merge branch 'master' into quantize	2025-12-06 15:07:36 +01:00
Daniel Bevenius	444f00b0ec	llama : remove quantization sanity check (#17788 ) * llama : remove quantization sanity check This commit removes the quantization sanity check for attention layers. The motivation for this is that there are model that are hybrid models that have recurrent layers, experts layers, and attention layers. For these models the current check fails as the experts layers are not taking into account. After consideration, it was decided that this check is not strictly necessary, and can be removed to allow for more flexible model architectures. * llama : remove unused pruned_attention_w and is_clip_model vars	2025-12-06 12:26:20 +01:00
Georgi Gerganov	a67ef0f47f	llama : fix sanity checks during quantization (#17721 )	2025-12-04 10:33:42 +02:00
Ed Addario	3f7842c645	Merge branch 'master' into quantize	2025-11-30 13:01:54 +00:00
Ed Addario	37cf51ebd0	Process bpw targets up to B/F16	2025-11-30 00:29:35 +00:00
Ed Addario	229109f329	Increase importance boost for final pass	2025-11-29 10:31:39 +00:00
Ed Addario	5b557ca958	Minor refactoring	2025-11-29 10:30:20 +00:00
Piotr Wilkin (ilintar)	ff55414c42	model : Qwen3 Next (#16095 ) * Qwen3 Next - cleaned up version * Whitespaces and stuff * Correct minor errors * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Misc. fixes. * Clean up code, add missing hybrid qualifier * Did someone transpose the SOLVE_TRI result matrix? Perhaps... * Whitespace * Proper tensors for cb calls * Use llama-graph.h vertical alignment * BROKEN: chunking * Set new tensors as inputs. * Proper chunk logic * It's the circle of life... * More shenanigans for n_seq > 1 * Nail in the coffin? * Fix Windows build * Eh, one fails on Windows, the other fails on Mac... just use general capture. * quant : cleanup * model : cleanup * qwen3 : cleanup * cont : cleanup * cont : cleanup * ggml : revert change * qwen3 : cleanup * cont : cleanup * Readd cmath * qwen3 : fix typo * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Usual suspects * fix my bad suggestion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-28 12:02:56 +01:00
Ed Addario	6616008420	Use more descriptive option naming	2025-11-24 18:26:45 +00:00
Ed Addario	1c9993e131	Add --disable-tensor-importance option	2025-11-23 17:51:04 +00:00
Ed Addario	9ec3e6e262	Remove processing statistics_data	2025-11-23 17:49:53 +00:00
Ed Addario	a0ba913613	Fix lambda capture bug in Windows and initialise candidate_types struct	2025-11-19 11:19:44 +00:00
Ed Addario	ac8cfbdd12	Improved is_important() logic	2025-11-17 18:03:09 +00:00
Ed Addario	b02b1b2304	Merge branch 'master' into quantize	2025-10-31 23:20:17 +00:00
Ed Addario	c59bb6d49d	Add Euclidean-Cosine score to identify important tensors	2025-10-30 22:11:40 +00:00
Ed Addario	6e32244a06	Read statistics from imatrix	2025-10-30 21:53:07 +00:00
Jan Boon	d7395115ba	llama : use std::abs instead of abs (#16853 )	2025-10-30 08:30:58 +02:00
Ed Addario	f8863b9a80	Minor refactoring	2025-10-28 15:22:32 +00:00
Ed Addario	5303212324	Simplify tensor selection	2025-10-26 17:40:52 +00:00
Ed Addario	d6ccd5649a	Finetune heuristics	2025-10-25 12:09:20 +01:00
Ed Addario	04561d5782	Update epsilon specifier	2025-10-21 12:53:26 +01:00
Ed Addario	27bf25e93c	Fix lambda capture	2025-10-20 22:04:35 +01:00
Ed Addario	543b5a99db	Fix lambda capture	2025-10-20 21:57:03 +01:00
Ed Addario	fa1df81d49	Finetune heuristics	2025-10-20 20:52:23 +01:00
Ed Addario	41a0069613	Merge branch 'master' into quantize	2025-10-16 22:20:04 +01:00
Ed Addario	a5103933bb	Minor refactoring	2025-10-16 15:11:48 +01:00
Ed Addario	0b3e930d52	Add option to override bpw state file name	2025-10-16 11:41:26 +01:00
Ed Addario	a6853ea2ae	Add tensor type and depth heuristics	2025-10-16 11:20:24 +01:00
Xuan-Son Nguyen	3e3cb19f64	llama-quant: add support for mmproj (#16592 ) * llama-quant: add support for mmproj * Update src/llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * check prefix instead * small fix --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-15 14:48:08 +02:00
Ed Addario	b7911f1431	Minor refactoring	2025-10-13 17:46:45 +01:00
Ed Addario	cd734b89ce	Update quant types	2025-10-13 15:15:23 +01:00
Ed Addario	b1b58e67df	Refactor signal handlers	2025-10-13 14:54:32 +01:00
Ed Addario	ca282302b5	Add --keep-bpw-state option	2025-10-12 18:23:23 +01:00
Ed Addario	b6094a97bf	Add quant types	2025-10-12 16:30:35 +01:00
Ed Addario	12e0524f3a	Reduce compute time by parallelising tensor processing - courtesy of https://github.com/ddh0	2025-10-12 15:12:15 +01:00
Ed Addario	5b0d3f6d5a	Automatically determine if bias error is significant	2025-10-11 10:04:48 +01:00
Ed Addario	c93131cef6	Remove --no-bias option	2025-10-10 13:26:51 +01:00
Ed Addario	3a3d807fc3	Remove bias mode computation	2025-10-10 13:10:42 +01:00
Ed Addario	c11184a3c1	Generate model ID hash	2025-10-09 11:58:01 +01:00
Ed Addario	044fa783c7	Fix trimming logic	2025-10-06 21:40:37 +01:00
Ed Addario	84ada44894	Uninstall signal handler and cleanup	2025-10-05 20:20:56 +01:00
Ed Addario	46706cec28	Persist progress	2025-10-05 20:20:28 +01:00
Ed Addario	74c62ed4e6	Add delete_bpw_state()	2025-10-05 20:19:03 +01:00
Ed Addario	02c3073b81	Add load_bpw_state()	2025-10-05 20:18:36 +01:00

1 2 3 4

173 Commits