llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	5b557ca958	Minor refactoring	2025-11-29 10:30:20 +00:00
Piotr Wilkin (ilintar)	ff55414c42	model : Qwen3 Next (#16095 ) * Qwen3 Next - cleaned up version * Whitespaces and stuff * Correct minor errors * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Misc. fixes. * Clean up code, add missing hybrid qualifier * Did someone transpose the SOLVE_TRI result matrix? Perhaps... * Whitespace * Proper tensors for cb calls * Use llama-graph.h vertical alignment * BROKEN: chunking * Set new tensors as inputs. * Proper chunk logic * It's the circle of life... * More shenanigans for n_seq > 1 * Nail in the coffin? * Fix Windows build * Eh, one fails on Windows, the other fails on Mac... just use general capture. * quant : cleanup * model : cleanup * qwen3 : cleanup * cont : cleanup * cont : cleanup * ggml : revert change * qwen3 : cleanup * cont : cleanup * Readd cmath * qwen3 : fix typo * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Usual suspects * fix my bad suggestion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-28 12:02:56 +01:00
Ed Addario	6616008420	Use more descriptive option naming	2025-11-24 18:26:45 +00:00
Ed Addario	1c9993e131	Add --disable-tensor-importance option	2025-11-23 17:51:04 +00:00
Ed Addario	9ec3e6e262	Remove processing statistics_data	2025-11-23 17:49:53 +00:00
Ed Addario	a0ba913613	Fix lambda capture bug in Windows and initialise candidate_types struct	2025-11-19 11:19:44 +00:00
Ed Addario	ac8cfbdd12	Improved is_important() logic	2025-11-17 18:03:09 +00:00
Ed Addario	b02b1b2304	Merge branch 'master' into quantize	2025-10-31 23:20:17 +00:00
Ed Addario	c59bb6d49d	Add Euclidean-Cosine score to identify important tensors	2025-10-30 22:11:40 +00:00
Ed Addario	6e32244a06	Read statistics from imatrix	2025-10-30 21:53:07 +00:00
Jan Boon	d7395115ba	llama : use std::abs instead of abs (#16853 )	2025-10-30 08:30:58 +02:00
Ed Addario	f8863b9a80	Minor refactoring	2025-10-28 15:22:32 +00:00
Ed Addario	5303212324	Simplify tensor selection	2025-10-26 17:40:52 +00:00
Ed Addario	d6ccd5649a	Finetune heuristics	2025-10-25 12:09:20 +01:00
Ed Addario	04561d5782	Update epsilon specifier	2025-10-21 12:53:26 +01:00
Ed Addario	27bf25e93c	Fix lambda capture	2025-10-20 22:04:35 +01:00
Ed Addario	543b5a99db	Fix lambda capture	2025-10-20 21:57:03 +01:00
Ed Addario	fa1df81d49	Finetune heuristics	2025-10-20 20:52:23 +01:00
Ed Addario	41a0069613	Merge branch 'master' into quantize	2025-10-16 22:20:04 +01:00
Ed Addario	a5103933bb	Minor refactoring	2025-10-16 15:11:48 +01:00
Ed Addario	0b3e930d52	Add option to override bpw state file name	2025-10-16 11:41:26 +01:00
Ed Addario	a6853ea2ae	Add tensor type and depth heuristics	2025-10-16 11:20:24 +01:00
Xuan-Son Nguyen	3e3cb19f64	llama-quant: add support for mmproj (#16592 ) * llama-quant: add support for mmproj * Update src/llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * check prefix instead * small fix --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-15 14:48:08 +02:00
Ed Addario	b7911f1431	Minor refactoring	2025-10-13 17:46:45 +01:00
Ed Addario	cd734b89ce	Update quant types	2025-10-13 15:15:23 +01:00
Ed Addario	b1b58e67df	Refactor signal handlers	2025-10-13 14:54:32 +01:00
Ed Addario	ca282302b5	Add --keep-bpw-state option	2025-10-12 18:23:23 +01:00
Ed Addario	b6094a97bf	Add quant types	2025-10-12 16:30:35 +01:00
Ed Addario	12e0524f3a	Reduce compute time by parallelising tensor processing - courtesy of https://github.com/ddh0	2025-10-12 15:12:15 +01:00
Ed Addario	5b0d3f6d5a	Automatically determine if bias error is significant	2025-10-11 10:04:48 +01:00
Ed Addario	c93131cef6	Remove --no-bias option	2025-10-10 13:26:51 +01:00
Ed Addario	3a3d807fc3	Remove bias mode computation	2025-10-10 13:10:42 +01:00
Ed Addario	c11184a3c1	Generate model ID hash	2025-10-09 11:58:01 +01:00
Ed Addario	044fa783c7	Fix trimming logic	2025-10-06 21:40:37 +01:00
Ed Addario	84ada44894	Uninstall signal handler and cleanup	2025-10-05 20:20:56 +01:00
Ed Addario	46706cec28	Persist progress	2025-10-05 20:20:28 +01:00
Ed Addario	74c62ed4e6	Add delete_bpw_state()	2025-10-05 20:19:03 +01:00
Ed Addario	02c3073b81	Add load_bpw_state()	2025-10-05 20:18:36 +01:00
Ed Addario	e48ca32f19	Add save_bpw_state()	2025-10-05 20:17:27 +01:00
Ed Addario	533cda3076	Add signal handler	2025-10-05 20:16:33 +01:00
Ed Addario	560e8c9d70	Relax lambda clamping	2025-10-05 14:41:42 +01:00
Ed Addario	f5d8811ddd	Prioritise important tensors	2025-10-01 19:04:43 +01:00
Ed Addario	b3b8a111a5	Compute rows based on tensor shape and slice count	2025-09-28 18:45:25 +01:00
Ed Addario	e49e241d37	Calculate bpw over all tensors	2025-09-27 17:28:39 +01:00
Ed Addario	3d75b14c0f	Simplify dequantisation	2025-09-27 17:27:58 +01:00
Ed Addario	8a2c71f471	Check for direction reversal	2025-09-27 17:27:29 +01:00
Ed Addario	87cba65908	Tighten worker allocator	2025-09-27 17:26:30 +01:00
Ed Addario	d16945730e	Refactor outlier trimming	2025-09-27 17:25:29 +01:00
Ed Addario	dd4f4bd0b8	Reduce bpw range	2025-09-27 17:23:48 +01:00
Ed Addario	dbdd179a92	Combine quant types	2025-09-25 19:50:20 +01:00
Ed Addario	a74b410f5f	Move is_iq() into a lambda and remove unused variables	2025-09-25 19:49:47 +01:00
Ed Addario	8eedcf74bc	Increase scale multiplier	2025-09-22 20:42:37 +01:00
Ed Addario	d36ee0a0a8	Add comments to explain magic numbers	2025-09-22 20:41:56 +01:00
Ed Addario	7ba6001ec8	Simplify candidates sorting	2025-09-22 20:11:54 +01:00
Ed Addario	d79ade2e8e	Adjust for small vector size	2025-09-22 20:11:26 +01:00
Ed Addario	f184450806	Fix minor logic flaw	2025-09-22 20:10:42 +01:00
Ed Addario	1fbc59f867	Replace slope with cross product	2025-09-22 20:10:10 +01:00
Ed Addario	c855094dff	Exit loop if no better solution found	2025-09-22 20:09:11 +01:00
Ed Addario	b748a1efa7	Fix typo	2025-09-21 22:03:54 +01:00
Ed Addario	896cdc2121	Refactor potential overflow	2025-09-21 22:03:36 +01:00
Ed Addario	fecc472c61	Fix typos in variable names	2025-09-21 17:26:38 +01:00
Ed Addario	e92db008bc	Refactor quantisation checks into its own function	2025-09-21 17:20:48 +01:00
Ed Addario	814f6b66be	Minor general refactoring	2025-09-21 16:45:09 +01:00
Ed Addario	0d5f18303e	Refactor lagrange_penalty()	2025-09-21 16:22:00 +01:00
Ed Addario	9a1656eb97	Refactor pareto optimise and convexify	2025-09-21 16:21:35 +01:00
Ed Addario	1a3e9ea4c8	Refactor estimate_error()	2025-09-21 16:21:00 +01:00
Ed Addario	a7ee915e19	Refactor trimmed_sum()	2025-09-21 16:20:06 +01:00
Ed Addario	b09662f86a	Refactor estimate_lambda()	2025-09-21 16:19:49 +01:00
Ed Addario	17be7615ce	Refactor candidate types build	2025-09-21 16:19:28 +01:00
Ed Addario	08146fd67f	Refactor side_data() and copy_or_broadcast()	2025-09-21 16:19:03 +01:00
Ed Addario	7386d4eadd	Refactor row sampling	2025-09-21 16:18:26 +01:00
Ed Addario	b6c008fd8a	Refactor helper lambdas	2025-09-21 16:04:13 +01:00
Ed Addario	b433fd9547	Refactor last budget pass	2025-09-21 13:43:09 +01:00
Ed Addario	c466c53808	Refactor pareto pruning and convexification	2025-09-21 13:42:54 +01:00
Ed Addario	6b8cedf3bc	Refactor estimate_lambda()	2025-09-21 13:42:31 +01:00
Ed Addario	bdefdb673c	Refactor copy_or_broadcast()	2025-09-21 13:42:07 +01:00
Ed Addario	e8e2aed17a	Refactor row sampling	2025-09-21 13:41:44 +01:00
Ed Addario	9e74f83411	Replace --bpw-bias flag with --no-bias	2025-09-20 23:06:37 +01:00
Ed Addario	ab02bb1f3e	Merge branch 'master' into quantize	2025-09-20 21:41:25 +01:00
Ed Addario	a36946997e	Replace fast_bias() for per slice version and remove precise_bias()	2025-09-20 21:36:54 +01:00
Ed Addario	14fae69a7b	General refactoring	2025-09-20 21:31:31 +01:00
Jie Fu (傅杰)	745cbcf2fe	llama-quant : fix the verification of attention layers for encoder-decoder models (#16023 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-17 09:30:55 +02:00
Ed Addario	ad70fca5b2	Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize	2025-09-15 07:42:37 +01:00
Ed Addario	9b857e3984	Merge branch 'ggml-org:master' into quantize	2025-09-14 23:35:43 +01:00
Ed Addario	c709e1a335	Fix MoE tensor estimation	2025-09-14 22:38:27 +01:00
Ed Addario	8503d59ee4	Increase IQ options	2025-09-13 11:49:18 +01:00
Ed Addario	2b516068e2	"Convexify" candidate list	2025-09-13 09:41:52 +01:00
Ed Addario	12e816b511	Replace greedy allocator with lagrangian relaxation	2025-09-13 09:24:23 +01:00
Ed Addario	7d85993f26	Minor refactoring	2025-09-13 08:44:41 +01:00
Ed Addario	4dff85fbe5	Improve precise_lambda() efficiency	2025-09-13 08:41:37 +01:00
Ed Addario	bc8762f27f	Capture surrounding function name	2025-09-13 08:33:22 +01:00
Ed Addario	886536d80a	Increase error type precision	2025-09-13 08:27:23 +01:00
ddh0	df082f5630	nitpick : correct MB to MiB (#15934 ) MB was incorrectly used for 1024 x 1024 bytes instead of MiB	2025-09-11 19:12:34 +02:00
Ed Addario	04c07b3272	Add better control over MSE and directional bias computation	2025-09-10 18:00:56 +01:00
Ed Addario	eab8708244	Minor factoring for efficiency and correctness	2025-08-30 10:14:46 +01:00
Ed Addario	556f6b04fe	Add --precise-lambda option	2025-08-28 16:08:08 +01:00
Ed Addario	66aff8fa1e	Add precise_lambda()	2025-08-28 16:06:42 +01:00
Ed Addario	8df1d00ae4	Add directional scaling	2025-08-28 16:04:28 +01:00
Ed Addario	04946114c9	Refactor epsilon into a function-wide variable	2025-08-28 16:01:03 +01:00
Ed Addario	4286690019	Minor comment update	2025-08-26 21:39:40 +01:00

1 2 3 4 5

211 Commits