Ed Addario
|
14fae69a7b
|
General refactoring
|
2025-09-20 21:31:31 +01:00 |
Ed Addario
|
ad70fca5b2
|
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize
|
2025-09-15 07:42:37 +01:00 |
Ed Addario
|
9b857e3984
|
Merge branch 'ggml-org:master' into quantize
|
2025-09-14 23:35:43 +01:00 |
Ed Addario
|
c709e1a335
|
Fix MoE tensor estimation
|
2025-09-14 22:38:27 +01:00 |
Ed Addario
|
8503d59ee4
|
Increase IQ options
|
2025-09-13 11:49:18 +01:00 |
Ed Addario
|
2b516068e2
|
"Convexify" candidate list
|
2025-09-13 09:41:52 +01:00 |
Ed Addario
|
12e816b511
|
Replace greedy allocator with lagrangian relaxation
|
2025-09-13 09:24:23 +01:00 |
Ed Addario
|
7d85993f26
|
Minor refactoring
|
2025-09-13 08:44:41 +01:00 |
Ed Addario
|
4dff85fbe5
|
Improve precise_lambda() efficiency
|
2025-09-13 08:41:37 +01:00 |
Ed Addario
|
bc8762f27f
|
Capture surrounding function name
|
2025-09-13 08:33:22 +01:00 |
Ed Addario
|
886536d80a
|
Increase error type precision
|
2025-09-13 08:27:23 +01:00 |
ddh0
|
df082f5630
|
nitpick : correct MB to MiB (#15934)
MB was incorrectly used for 1024 x 1024 bytes instead of MiB
|
2025-09-11 19:12:34 +02:00 |
Ed Addario
|
04c07b3272
|
Add better control over MSE and directional bias computation
|
2025-09-10 18:00:56 +01:00 |
Ed Addario
|
eab8708244
|
Minor factoring for efficiency and correctness
|
2025-08-30 10:14:46 +01:00 |
Ed Addario
|
556f6b04fe
|
Add --precise-lambda option
|
2025-08-28 16:08:08 +01:00 |
Ed Addario
|
66aff8fa1e
|
Add precise_lambda()
|
2025-08-28 16:06:42 +01:00 |
Ed Addario
|
8df1d00ae4
|
Add directional scaling
|
2025-08-28 16:04:28 +01:00 |
Ed Addario
|
04946114c9
|
Refactor epsilon into a function-wide variable
|
2025-08-28 16:01:03 +01:00 |
Ed Addario
|
4286690019
|
Minor comment update
|
2025-08-26 21:39:40 +01:00 |
Ed Addario
|
d4ac2106fb
|
Improve logging and some minor code refactoring
|
2025-08-24 13:39:10 +01:00 |
Ed Addario
|
61c0e01f50
|
Execute bpw_overrides() only if an imatrix file is provided
|
2025-08-24 13:36:03 +01:00 |
Ed Addario
|
3856d60328
|
Restrict quant types per family
|
2025-08-23 14:45:07 +01:00 |
Ed Addario
|
decafae270
|
Adjust bias_lambda
|
2025-08-23 11:30:11 +01:00 |
Ed Addario
|
68ae5e66ce
|
Improve list of candidate types
|
2025-08-23 02:50:55 +01:00 |
Ed Addario
|
73124a9921
|
Refactor estimate_error()
|
2025-08-23 02:17:22 +01:00 |
Ed Addario
|
f75265f55b
|
Fix typo
|
2025-08-23 01:08:37 +01:00 |
Ed Addario
|
9a4b115497
|
Explicitly adding <atomic> include
|
2025-08-23 01:08:01 +01:00 |
Ed Addario
|
6d17889add
|
Log if override is from tensor-type or from bpw-target
|
2025-08-22 16:58:46 +01:00 |
Ed Addario
|
fea99d051a
|
Refactor and combine lambdas
|
2025-08-22 16:57:58 +01:00 |
Ed Addario
|
f05c8483d8
|
Improve dequantized_buffer fill
|
2025-08-22 09:17:58 +01:00 |
Ed Addario
|
897decbe8a
|
Show skipped IQ tensors
|
2025-08-22 09:15:11 +01:00 |
Ed Addario
|
01c927fb94
|
Improve pareto efficient candidate selection
|
2025-08-22 09:14:14 +01:00 |
Ed Addario
|
47cdbe2155
|
Reduce sampling window to speedup process
|
2025-08-22 09:11:11 +01:00 |
Ed Addario
|
2f13fee795
|
Parameterise type
|
2025-08-22 09:05:55 +01:00 |
Ed Addario
|
bb0d912c1f
|
Update comments
|
2025-08-22 09:02:56 +01:00 |
Ed Addario
|
35c1504441
|
Fix byte count for 3d or higher tensors
|
2025-08-22 09:01:57 +01:00 |
Ed Addario
|
ec0afbe79f
|
Include embeddings and output tensors
|
2025-08-22 01:46:09 +01:00 |
Ed Addario
|
5b6f1e9fde
|
General code refactor
|
2025-08-21 19:18:54 +01:00 |
Ed Addario
|
9e11f82e8f
|
Precompute error denominator in estimate_erro()
|
2025-08-21 16:25:31 +01:00 |
Ed Addario
|
887490c5ec
|
Dequantise sampled rows only
|
2025-08-21 15:11:49 +01:00 |
Ed Addario
|
e01dad886b
|
Parallelise candidate evaluation
|
2025-08-21 12:47:13 +01:00 |
Ed Addario
|
95b2ab2800
|
Change error estimate to use normalised weighted MSE
|
2025-08-21 10:46:37 +01:00 |
Ed Addario
|
5ef493ea1a
|
Exclude embeddings and output tensor
|
2025-08-21 09:48:29 +01:00 |
Ed Addario
|
35ad0fc4ad
|
Improve error estimation using weighted MSE
|
2025-08-20 23:27:20 +01:00 |
Ed Addario
|
b0b33b7ccb
|
Optimise tensor sampling
|
2025-08-20 20:58:26 +01:00 |
Ed Addario
|
3f0118d602
|
Fix bias lambda bug
|
2025-08-20 17:26:37 +01:00 |
Ed Addario
|
52da4a4f8c
|
Skip if output.weight or type is COPY
|
2025-08-20 17:26:05 +01:00 |
Ed Addario
|
43caadf783
|
Add better fallbacks for IQ mixes
|
2025-08-20 17:24:48 +01:00 |
Ed Addario
|
29b2dc3ec0
|
Do not mix K and IQ quants
|
2025-08-20 13:27:01 +01:00 |
Ed Addario
|
5cd69a6809
|
Add F16/BF16 type
|
2025-08-20 09:41:39 +01:00 |