llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	08146fd67f	Refactor side_data() and copy_or_broadcast()	2025-09-21 16:19:03 +01:00
Ed Addario	7386d4eadd	Refactor row sampling	2025-09-21 16:18:26 +01:00
Ed Addario	b6c008fd8a	Refactor helper lambdas	2025-09-21 16:04:13 +01:00
Ed Addario	b433fd9547	Refactor last budget pass	2025-09-21 13:43:09 +01:00
Ed Addario	c466c53808	Refactor pareto pruning and convexification	2025-09-21 13:42:54 +01:00
Ed Addario	6b8cedf3bc	Refactor estimate_lambda()	2025-09-21 13:42:31 +01:00
Ed Addario	bdefdb673c	Refactor copy_or_broadcast()	2025-09-21 13:42:07 +01:00
Ed Addario	e8e2aed17a	Refactor row sampling	2025-09-21 13:41:44 +01:00
Ed Addario	9e74f83411	Replace --bpw-bias flag with --no-bias	2025-09-20 23:06:37 +01:00
Ed Addario	ab02bb1f3e	Merge branch 'master' into quantize	2025-09-20 21:41:25 +01:00
Ed Addario	a36946997e	Replace fast_bias() for per slice version and remove precise_bias()	2025-09-20 21:36:54 +01:00
Ed Addario	14fae69a7b	General refactoring	2025-09-20 21:31:31 +01:00
Jie Fu (傅杰)	745cbcf2fe	llama-quant : fix the verification of attention layers for encoder-decoder models (#16023 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-17 09:30:55 +02:00
Ed Addario	ad70fca5b2	Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize	2025-09-15 07:42:37 +01:00
Ed Addario	9b857e3984	Merge branch 'ggml-org:master' into quantize	2025-09-14 23:35:43 +01:00
Ed Addario	c709e1a335	Fix MoE tensor estimation	2025-09-14 22:38:27 +01:00
Ed Addario	8503d59ee4	Increase IQ options	2025-09-13 11:49:18 +01:00
Ed Addario	2b516068e2	"Convexify" candidate list	2025-09-13 09:41:52 +01:00
Ed Addario	12e816b511	Replace greedy allocator with lagrangian relaxation	2025-09-13 09:24:23 +01:00
Ed Addario	7d85993f26	Minor refactoring	2025-09-13 08:44:41 +01:00
Ed Addario	4dff85fbe5	Improve precise_lambda() efficiency	2025-09-13 08:41:37 +01:00
Ed Addario	bc8762f27f	Capture surrounding function name	2025-09-13 08:33:22 +01:00
Ed Addario	886536d80a	Increase error type precision	2025-09-13 08:27:23 +01:00
ddh0	df082f5630	nitpick : correct MB to MiB (#15934 ) MB was incorrectly used for 1024 x 1024 bytes instead of MiB	2025-09-11 19:12:34 +02:00
Ed Addario	04c07b3272	Add better control over MSE and directional bias computation	2025-09-10 18:00:56 +01:00
Ed Addario	eab8708244	Minor factoring for efficiency and correctness	2025-08-30 10:14:46 +01:00
Ed Addario	556f6b04fe	Add --precise-lambda option	2025-08-28 16:08:08 +01:00
Ed Addario	66aff8fa1e	Add precise_lambda()	2025-08-28 16:06:42 +01:00
Ed Addario	8df1d00ae4	Add directional scaling	2025-08-28 16:04:28 +01:00
Ed Addario	04946114c9	Refactor epsilon into a function-wide variable	2025-08-28 16:01:03 +01:00
Ed Addario	4286690019	Minor comment update	2025-08-26 21:39:40 +01:00
Ed Addario	d4ac2106fb	Improve logging and some minor code refactoring	2025-08-24 13:39:10 +01:00
Ed Addario	61c0e01f50	Execute bpw_overrides() only if an imatrix file is provided	2025-08-24 13:36:03 +01:00
Ed Addario	3856d60328	Restrict quant types per family	2025-08-23 14:45:07 +01:00
Ed Addario	decafae270	Adjust bias_lambda	2025-08-23 11:30:11 +01:00
Ed Addario	68ae5e66ce	Improve list of candidate types	2025-08-23 02:50:55 +01:00
Ed Addario	73124a9921	Refactor estimate_error()	2025-08-23 02:17:22 +01:00
Ed Addario	f75265f55b	Fix typo	2025-08-23 01:08:37 +01:00
Ed Addario	9a4b115497	Explicitly adding <atomic> include	2025-08-23 01:08:01 +01:00
Ed Addario	6d17889add	Log if override is from tensor-type or from bpw-target	2025-08-22 16:58:46 +01:00
Ed Addario	fea99d051a	Refactor and combine lambdas	2025-08-22 16:57:58 +01:00
Ed Addario	f05c8483d8	Improve dequantized_buffer fill	2025-08-22 09:17:58 +01:00
Ed Addario	897decbe8a	Show skipped IQ tensors	2025-08-22 09:15:11 +01:00
Ed Addario	01c927fb94	Improve pareto efficient candidate selection	2025-08-22 09:14:14 +01:00
Ed Addario	47cdbe2155	Reduce sampling window to speedup process	2025-08-22 09:11:11 +01:00
Ed Addario	2f13fee795	Parameterise type	2025-08-22 09:05:55 +01:00
Ed Addario	bb0d912c1f	Update comments	2025-08-22 09:02:56 +01:00
Ed Addario	35c1504441	Fix byte count for 3d or higher tensors	2025-08-22 09:01:57 +01:00
Ed Addario	ec0afbe79f	Include embeddings and output tensors	2025-08-22 01:46:09 +01:00
Ed Addario	5b6f1e9fde	General code refactor	2025-08-21 19:18:54 +01:00

1 2

92 Commits