llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	fa1df81d49	Finetune heuristics	2025-10-20 20:52:23 +01:00
Ed Addario	41a0069613	Merge branch 'master' into quantize	2025-10-16 22:20:04 +01:00
Ed Addario	a5103933bb	Minor refactoring	2025-10-16 15:11:48 +01:00
Ed Addario	0b3e930d52	Add option to override bpw state file name	2025-10-16 11:41:26 +01:00
Ed Addario	a6853ea2ae	Add tensor type and depth heuristics	2025-10-16 11:20:24 +01:00
Xuan-Son Nguyen	3e3cb19f64	llama-quant: add support for mmproj (#16592 ) * llama-quant: add support for mmproj * Update src/llama.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * check prefix instead * small fix --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-15 14:48:08 +02:00
Ed Addario	b7911f1431	Minor refactoring	2025-10-13 17:46:45 +01:00
Ed Addario	cd734b89ce	Update quant types	2025-10-13 15:15:23 +01:00
Ed Addario	b1b58e67df	Refactor signal handlers	2025-10-13 14:54:32 +01:00
Ed Addario	ca282302b5	Add --keep-bpw-state option	2025-10-12 18:23:23 +01:00
Ed Addario	b6094a97bf	Add quant types	2025-10-12 16:30:35 +01:00
Ed Addario	12e0524f3a	Reduce compute time by parallelising tensor processing - courtesy of https://github.com/ddh0	2025-10-12 15:12:15 +01:00
Ed Addario	5b0d3f6d5a	Automatically determine if bias error is significant	2025-10-11 10:04:48 +01:00
Ed Addario	c93131cef6	Remove --no-bias option	2025-10-10 13:26:51 +01:00
Ed Addario	3a3d807fc3	Remove bias mode computation	2025-10-10 13:10:42 +01:00
Ed Addario	c11184a3c1	Generate model ID hash	2025-10-09 11:58:01 +01:00
Ed Addario	044fa783c7	Fix trimming logic	2025-10-06 21:40:37 +01:00
Ed Addario	84ada44894	Uninstall signal handler and cleanup	2025-10-05 20:20:56 +01:00
Ed Addario	46706cec28	Persist progress	2025-10-05 20:20:28 +01:00
Ed Addario	74c62ed4e6	Add delete_bpw_state()	2025-10-05 20:19:03 +01:00
Ed Addario	02c3073b81	Add load_bpw_state()	2025-10-05 20:18:36 +01:00
Ed Addario	e48ca32f19	Add save_bpw_state()	2025-10-05 20:17:27 +01:00
Ed Addario	533cda3076	Add signal handler	2025-10-05 20:16:33 +01:00
Ed Addario	560e8c9d70	Relax lambda clamping	2025-10-05 14:41:42 +01:00
Ed Addario	f5d8811ddd	Prioritise important tensors	2025-10-01 19:04:43 +01:00
Ed Addario	b3b8a111a5	Compute rows based on tensor shape and slice count	2025-09-28 18:45:25 +01:00
Ed Addario	e49e241d37	Calculate bpw over all tensors	2025-09-27 17:28:39 +01:00
Ed Addario	3d75b14c0f	Simplify dequantisation	2025-09-27 17:27:58 +01:00
Ed Addario	8a2c71f471	Check for direction reversal	2025-09-27 17:27:29 +01:00
Ed Addario	87cba65908	Tighten worker allocator	2025-09-27 17:26:30 +01:00
Ed Addario	d16945730e	Refactor outlier trimming	2025-09-27 17:25:29 +01:00
Ed Addario	dd4f4bd0b8	Reduce bpw range	2025-09-27 17:23:48 +01:00
Ed Addario	dbdd179a92	Combine quant types	2025-09-25 19:50:20 +01:00
Ed Addario	a74b410f5f	Move is_iq() into a lambda and remove unused variables	2025-09-25 19:49:47 +01:00
Ed Addario	8eedcf74bc	Increase scale multiplier	2025-09-22 20:42:37 +01:00
Ed Addario	d36ee0a0a8	Add comments to explain magic numbers	2025-09-22 20:41:56 +01:00
Ed Addario	7ba6001ec8	Simplify candidates sorting	2025-09-22 20:11:54 +01:00
Ed Addario	d79ade2e8e	Adjust for small vector size	2025-09-22 20:11:26 +01:00
Ed Addario	f184450806	Fix minor logic flaw	2025-09-22 20:10:42 +01:00
Ed Addario	1fbc59f867	Replace slope with cross product	2025-09-22 20:10:10 +01:00
Ed Addario	c855094dff	Exit loop if no better solution found	2025-09-22 20:09:11 +01:00
Ed Addario	b748a1efa7	Fix typo	2025-09-21 22:03:54 +01:00
Ed Addario	896cdc2121	Refactor potential overflow	2025-09-21 22:03:36 +01:00
Ed Addario	fecc472c61	Fix typos in variable names	2025-09-21 17:26:38 +01:00
Ed Addario	e92db008bc	Refactor quantisation checks into its own function	2025-09-21 17:20:48 +01:00
Ed Addario	814f6b66be	Minor general refactoring	2025-09-21 16:45:09 +01:00
Ed Addario	0d5f18303e	Refactor lagrange_penalty()	2025-09-21 16:22:00 +01:00
Ed Addario	9a1656eb97	Refactor pareto optimise and convexify	2025-09-21 16:21:35 +01:00
Ed Addario	1a3e9ea4c8	Refactor estimate_error()	2025-09-21 16:21:00 +01:00
Ed Addario	a7ee915e19	Refactor trimmed_sum()	2025-09-21 16:20:06 +01:00
Ed Addario	b09662f86a	Refactor estimate_lambda()	2025-09-21 16:19:49 +01:00
Ed Addario	17be7615ce	Refactor candidate types build	2025-09-21 16:19:28 +01:00
Ed Addario	08146fd67f	Refactor side_data() and copy_or_broadcast()	2025-09-21 16:19:03 +01:00
Ed Addario	7386d4eadd	Refactor row sampling	2025-09-21 16:18:26 +01:00
Ed Addario	b6c008fd8a	Refactor helper lambdas	2025-09-21 16:04:13 +01:00
Ed Addario	b433fd9547	Refactor last budget pass	2025-09-21 13:43:09 +01:00
Ed Addario	c466c53808	Refactor pareto pruning and convexification	2025-09-21 13:42:54 +01:00
Ed Addario	6b8cedf3bc	Refactor estimate_lambda()	2025-09-21 13:42:31 +01:00
Ed Addario	bdefdb673c	Refactor copy_or_broadcast()	2025-09-21 13:42:07 +01:00
Ed Addario	e8e2aed17a	Refactor row sampling	2025-09-21 13:41:44 +01:00
Ed Addario	9e74f83411	Replace --bpw-bias flag with --no-bias	2025-09-20 23:06:37 +01:00
Ed Addario	ab02bb1f3e	Merge branch 'master' into quantize	2025-09-20 21:41:25 +01:00
Ed Addario	a36946997e	Replace fast_bias() for per slice version and remove precise_bias()	2025-09-20 21:36:54 +01:00
Ed Addario	14fae69a7b	General refactoring	2025-09-20 21:31:31 +01:00
Jie Fu (傅杰)	745cbcf2fe	llama-quant : fix the verification of attention layers for encoder-decoder models (#16023 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-17 09:30:55 +02:00
Ed Addario	ad70fca5b2	Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize	2025-09-15 07:42:37 +01:00
Ed Addario	9b857e3984	Merge branch 'ggml-org:master' into quantize	2025-09-14 23:35:43 +01:00
Ed Addario	c709e1a335	Fix MoE tensor estimation	2025-09-14 22:38:27 +01:00
Ed Addario	8503d59ee4	Increase IQ options	2025-09-13 11:49:18 +01:00
Ed Addario	2b516068e2	"Convexify" candidate list	2025-09-13 09:41:52 +01:00
Ed Addario	12e816b511	Replace greedy allocator with lagrangian relaxation	2025-09-13 09:24:23 +01:00
Ed Addario	7d85993f26	Minor refactoring	2025-09-13 08:44:41 +01:00
Ed Addario	4dff85fbe5	Improve precise_lambda() efficiency	2025-09-13 08:41:37 +01:00
Ed Addario	bc8762f27f	Capture surrounding function name	2025-09-13 08:33:22 +01:00
Ed Addario	886536d80a	Increase error type precision	2025-09-13 08:27:23 +01:00
ddh0	df082f5630	nitpick : correct MB to MiB (#15934 ) MB was incorrectly used for 1024 x 1024 bytes instead of MiB	2025-09-11 19:12:34 +02:00
Ed Addario	04c07b3272	Add better control over MSE and directional bias computation	2025-09-10 18:00:56 +01:00
Ed Addario	eab8708244	Minor factoring for efficiency and correctness	2025-08-30 10:14:46 +01:00
Ed Addario	556f6b04fe	Add --precise-lambda option	2025-08-28 16:08:08 +01:00
Ed Addario	66aff8fa1e	Add precise_lambda()	2025-08-28 16:06:42 +01:00
Ed Addario	8df1d00ae4	Add directional scaling	2025-08-28 16:04:28 +01:00
Ed Addario	04946114c9	Refactor epsilon into a function-wide variable	2025-08-28 16:01:03 +01:00
Ed Addario	4286690019	Minor comment update	2025-08-26 21:39:40 +01:00
Ed Addario	d4ac2106fb	Improve logging and some minor code refactoring	2025-08-24 13:39:10 +01:00
Ed Addario	61c0e01f50	Execute bpw_overrides() only if an imatrix file is provided	2025-08-24 13:36:03 +01:00
Ed Addario	3856d60328	Restrict quant types per family	2025-08-23 14:45:07 +01:00
Ed Addario	decafae270	Adjust bias_lambda	2025-08-23 11:30:11 +01:00
Ed Addario	68ae5e66ce	Improve list of candidate types	2025-08-23 02:50:55 +01:00
Ed Addario	73124a9921	Refactor estimate_error()	2025-08-23 02:17:22 +01:00
Ed Addario	f75265f55b	Fix typo	2025-08-23 01:08:37 +01:00
Ed Addario	9a4b115497	Explicitly adding <atomic> include	2025-08-23 01:08:01 +01:00
Ed Addario	6d17889add	Log if override is from tensor-type or from bpw-target	2025-08-22 16:58:46 +01:00
Ed Addario	fea99d051a	Refactor and combine lambdas	2025-08-22 16:57:58 +01:00
Ed Addario	f05c8483d8	Improve dequantized_buffer fill	2025-08-22 09:17:58 +01:00
Ed Addario	897decbe8a	Show skipped IQ tensors	2025-08-22 09:15:11 +01:00
Ed Addario	01c927fb94	Improve pareto efficient candidate selection	2025-08-22 09:14:14 +01:00
Ed Addario	47cdbe2155	Reduce sampling window to speedup process	2025-08-22 09:11:11 +01:00
Ed Addario	2f13fee795	Parameterise type	2025-08-22 09:05:55 +01:00
Ed Addario	bb0d912c1f	Update comments	2025-08-22 09:02:56 +01:00
Ed Addario	35c1504441	Fix byte count for 3d or higher tensors	2025-08-22 09:01:57 +01:00

1 2 3 4

194 Commits