Commit Graph

80 Commits

Author SHA1 Message Date
Ed Addario 14fae69a7b
General refactoring 2025-09-20 21:31:31 +01:00
Ed Addario ad70fca5b2
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize 2025-09-15 07:42:37 +01:00
Ed Addario 9b857e3984
Merge branch 'ggml-org:master' into quantize 2025-09-14 23:35:43 +01:00
Ed Addario c709e1a335
Fix MoE tensor estimation 2025-09-14 22:38:27 +01:00
Ed Addario 8503d59ee4
Increase IQ options 2025-09-13 11:49:18 +01:00
Ed Addario 2b516068e2
"Convexify" candidate list 2025-09-13 09:41:52 +01:00
Ed Addario 12e816b511
Replace greedy allocator with lagrangian relaxation 2025-09-13 09:24:23 +01:00
Ed Addario 7d85993f26
Minor refactoring 2025-09-13 08:44:41 +01:00
Ed Addario 4dff85fbe5
Improve precise_lambda() efficiency 2025-09-13 08:41:37 +01:00
Ed Addario bc8762f27f
Capture surrounding function name 2025-09-13 08:33:22 +01:00
Ed Addario 886536d80a
Increase error type precision 2025-09-13 08:27:23 +01:00
ddh0 df082f5630
nitpick : correct MB to MiB (#15934)
MB was incorrectly used for 1024 x 1024 bytes instead of MiB
2025-09-11 19:12:34 +02:00
Ed Addario 04c07b3272
Add better control over MSE and directional bias computation 2025-09-10 18:00:56 +01:00
Ed Addario eab8708244
Minor factoring for efficiency and correctness 2025-08-30 10:14:46 +01:00
Ed Addario 556f6b04fe
Add --precise-lambda option 2025-08-28 16:08:08 +01:00
Ed Addario 66aff8fa1e
Add precise_lambda() 2025-08-28 16:06:42 +01:00
Ed Addario 8df1d00ae4
Add directional scaling 2025-08-28 16:04:28 +01:00
Ed Addario 04946114c9
Refactor epsilon into a function-wide variable 2025-08-28 16:01:03 +01:00
Ed Addario 4286690019
Minor comment update 2025-08-26 21:39:40 +01:00
Ed Addario d4ac2106fb
Improve logging and some minor code refactoring 2025-08-24 13:39:10 +01:00
Ed Addario 61c0e01f50
Execute bpw_overrides() only if an imatrix file is provided 2025-08-24 13:36:03 +01:00
Ed Addario 3856d60328
Restrict quant types per family 2025-08-23 14:45:07 +01:00
Ed Addario decafae270
Adjust bias_lambda 2025-08-23 11:30:11 +01:00
Ed Addario 68ae5e66ce
Improve list of candidate types 2025-08-23 02:50:55 +01:00
Ed Addario 73124a9921
Refactor estimate_error() 2025-08-23 02:17:22 +01:00
Ed Addario f75265f55b
Fix typo 2025-08-23 01:08:37 +01:00
Ed Addario 9a4b115497
Explicitly adding <atomic> include 2025-08-23 01:08:01 +01:00
Ed Addario 6d17889add
Log if override is from tensor-type or from bpw-target 2025-08-22 16:58:46 +01:00
Ed Addario fea99d051a
Refactor and combine lambdas 2025-08-22 16:57:58 +01:00
Ed Addario f05c8483d8
Improve dequantized_buffer fill 2025-08-22 09:17:58 +01:00
Ed Addario 897decbe8a
Show skipped IQ tensors 2025-08-22 09:15:11 +01:00
Ed Addario 01c927fb94
Improve pareto efficient candidate selection 2025-08-22 09:14:14 +01:00
Ed Addario 47cdbe2155
Reduce sampling window to speedup process 2025-08-22 09:11:11 +01:00
Ed Addario 2f13fee795
Parameterise type 2025-08-22 09:05:55 +01:00
Ed Addario bb0d912c1f
Update comments 2025-08-22 09:02:56 +01:00
Ed Addario 35c1504441
Fix byte count for 3d or higher tensors 2025-08-22 09:01:57 +01:00
Ed Addario ec0afbe79f
Include embeddings and output tensors 2025-08-22 01:46:09 +01:00
Ed Addario 5b6f1e9fde
General code refactor 2025-08-21 19:18:54 +01:00
Ed Addario 9e11f82e8f
Precompute error denominator in estimate_erro() 2025-08-21 16:25:31 +01:00
Ed Addario 887490c5ec
Dequantise sampled rows only 2025-08-21 15:11:49 +01:00
Ed Addario e01dad886b
Parallelise candidate evaluation 2025-08-21 12:47:13 +01:00
Ed Addario 95b2ab2800
Change error estimate to use normalised weighted MSE 2025-08-21 10:46:37 +01:00
Ed Addario 5ef493ea1a
Exclude embeddings and output tensor 2025-08-21 09:48:29 +01:00
Ed Addario 35ad0fc4ad
Improve error estimation using weighted MSE 2025-08-20 23:27:20 +01:00
Ed Addario b0b33b7ccb
Optimise tensor sampling 2025-08-20 20:58:26 +01:00
Ed Addario 3f0118d602
Fix bias lambda bug 2025-08-20 17:26:37 +01:00
Ed Addario 52da4a4f8c
Skip if output.weight or type is COPY 2025-08-20 17:26:05 +01:00
Ed Addario 43caadf783
Add better fallbacks for IQ mixes 2025-08-20 17:24:48 +01:00
Ed Addario 29b2dc3ec0
Do not mix K and IQ quants 2025-08-20 13:27:01 +01:00
Ed Addario 5cd69a6809
Add F16/BF16 type 2025-08-20 09:41:39 +01:00