Ed Addario
|
e48ca32f19
|
Add save_bpw_state()
|
2025-10-05 20:17:27 +01:00 |
Ed Addario
|
533cda3076
|
Add signal handler
|
2025-10-05 20:16:33 +01:00 |
Ed Addario
|
560e8c9d70
|
Relax lambda clamping
|
2025-10-05 14:41:42 +01:00 |
Ed Addario
|
f5d8811ddd
|
Prioritise important tensors
|
2025-10-01 19:04:43 +01:00 |
Ed Addario
|
b3b8a111a5
|
Compute rows based on tensor shape and slice count
|
2025-09-28 18:45:25 +01:00 |
Ed Addario
|
e49e241d37
|
Calculate bpw over all tensors
|
2025-09-27 17:28:39 +01:00 |
Ed Addario
|
3d75b14c0f
|
Simplify dequantisation
|
2025-09-27 17:27:58 +01:00 |
Ed Addario
|
8a2c71f471
|
Check for direction reversal
|
2025-09-27 17:27:29 +01:00 |
Ed Addario
|
87cba65908
|
Tighten worker allocator
|
2025-09-27 17:26:30 +01:00 |
Ed Addario
|
d16945730e
|
Refactor outlier trimming
|
2025-09-27 17:25:29 +01:00 |
Ed Addario
|
dd4f4bd0b8
|
Reduce bpw range
|
2025-09-27 17:23:48 +01:00 |
Ed Addario
|
dbdd179a92
|
Combine quant types
|
2025-09-25 19:50:20 +01:00 |
Ed Addario
|
a74b410f5f
|
Move is_iq() into a lambda and remove unused variables
|
2025-09-25 19:49:47 +01:00 |
Ed Addario
|
8eedcf74bc
|
Increase scale multiplier
|
2025-09-22 20:42:37 +01:00 |
Ed Addario
|
d36ee0a0a8
|
Add comments to explain magic numbers
|
2025-09-22 20:41:56 +01:00 |
Ed Addario
|
7ba6001ec8
|
Simplify candidates sorting
|
2025-09-22 20:11:54 +01:00 |
Ed Addario
|
d79ade2e8e
|
Adjust for small vector size
|
2025-09-22 20:11:26 +01:00 |
Ed Addario
|
f184450806
|
Fix minor logic flaw
|
2025-09-22 20:10:42 +01:00 |
Ed Addario
|
1fbc59f867
|
Replace slope with cross product
|
2025-09-22 20:10:10 +01:00 |
Ed Addario
|
c855094dff
|
Exit loop if no better solution found
|
2025-09-22 20:09:11 +01:00 |
Ed Addario
|
b748a1efa7
|
Fix typo
|
2025-09-21 22:03:54 +01:00 |
Ed Addario
|
896cdc2121
|
Refactor potential overflow
|
2025-09-21 22:03:36 +01:00 |
Ed Addario
|
fecc472c61
|
Fix typos in variable names
|
2025-09-21 17:26:38 +01:00 |
Ed Addario
|
e92db008bc
|
Refactor quantisation checks into its own function
|
2025-09-21 17:20:48 +01:00 |
Ed Addario
|
814f6b66be
|
Minor general refactoring
|
2025-09-21 16:45:09 +01:00 |
Ed Addario
|
0d5f18303e
|
Refactor lagrange_penalty()
|
2025-09-21 16:22:00 +01:00 |
Ed Addario
|
9a1656eb97
|
Refactor pareto optimise and convexify
|
2025-09-21 16:21:35 +01:00 |
Ed Addario
|
1a3e9ea4c8
|
Refactor estimate_error()
|
2025-09-21 16:21:00 +01:00 |
Ed Addario
|
a7ee915e19
|
Refactor trimmed_sum()
|
2025-09-21 16:20:06 +01:00 |
Ed Addario
|
b09662f86a
|
Refactor estimate_lambda()
|
2025-09-21 16:19:49 +01:00 |
Ed Addario
|
17be7615ce
|
Refactor candidate types build
|
2025-09-21 16:19:28 +01:00 |
Ed Addario
|
08146fd67f
|
Refactor side_data() and copy_or_broadcast()
|
2025-09-21 16:19:03 +01:00 |
Ed Addario
|
7386d4eadd
|
Refactor row sampling
|
2025-09-21 16:18:26 +01:00 |
Ed Addario
|
b6c008fd8a
|
Refactor helper lambdas
|
2025-09-21 16:04:13 +01:00 |
Ed Addario
|
b433fd9547
|
Refactor last budget pass
|
2025-09-21 13:43:09 +01:00 |
Ed Addario
|
c466c53808
|
Refactor pareto pruning and convexification
|
2025-09-21 13:42:54 +01:00 |
Ed Addario
|
6b8cedf3bc
|
Refactor estimate_lambda()
|
2025-09-21 13:42:31 +01:00 |
Ed Addario
|
bdefdb673c
|
Refactor copy_or_broadcast()
|
2025-09-21 13:42:07 +01:00 |
Ed Addario
|
e8e2aed17a
|
Refactor row sampling
|
2025-09-21 13:41:44 +01:00 |
Ed Addario
|
9e74f83411
|
Replace --bpw-bias flag with --no-bias
|
2025-09-20 23:06:37 +01:00 |
Ed Addario
|
ab02bb1f3e
|
Merge branch 'master' into quantize
|
2025-09-20 21:41:25 +01:00 |
Ed Addario
|
a36946997e
|
Replace fast_bias() for per slice version and remove precise_bias()
|
2025-09-20 21:36:54 +01:00 |
Ed Addario
|
14fae69a7b
|
General refactoring
|
2025-09-20 21:31:31 +01:00 |
Jie Fu (傅杰)
|
745cbcf2fe
|
llama-quant : fix the verification of attention layers for encoder-decoder models (#16023)
Signed-off-by: Jie Fu <jiefu@tencent.com>
|
2025-09-17 09:30:55 +02:00 |
Ed Addario
|
ad70fca5b2
|
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize
|
2025-09-15 07:42:37 +01:00 |
Ed Addario
|
9b857e3984
|
Merge branch 'ggml-org:master' into quantize
|
2025-09-14 23:35:43 +01:00 |
Ed Addario
|
c709e1a335
|
Fix MoE tensor estimation
|
2025-09-14 22:38:27 +01:00 |
Ed Addario
|
8503d59ee4
|
Increase IQ options
|
2025-09-13 11:49:18 +01:00 |
Ed Addario
|
2b516068e2
|
"Convexify" candidate list
|
2025-09-13 09:41:52 +01:00 |
Ed Addario
|
12e816b511
|
Replace greedy allocator with lagrangian relaxation
|
2025-09-13 09:24:23 +01:00 |
Ed Addario
|
7d85993f26
|
Minor refactoring
|
2025-09-13 08:44:41 +01:00 |
Ed Addario
|
4dff85fbe5
|
Improve precise_lambda() efficiency
|
2025-09-13 08:41:37 +01:00 |
Ed Addario
|
bc8762f27f
|
Capture surrounding function name
|
2025-09-13 08:33:22 +01:00 |
Ed Addario
|
886536d80a
|
Increase error type precision
|
2025-09-13 08:27:23 +01:00 |
ddh0
|
df082f5630
|
nitpick : correct MB to MiB (#15934)
MB was incorrectly used for 1024 x 1024 bytes instead of MiB
|
2025-09-11 19:12:34 +02:00 |
Ed Addario
|
04c07b3272
|
Add better control over MSE and directional bias computation
|
2025-09-10 18:00:56 +01:00 |
Ed Addario
|
eab8708244
|
Minor factoring for efficiency and correctness
|
2025-08-30 10:14:46 +01:00 |
Ed Addario
|
556f6b04fe
|
Add --precise-lambda option
|
2025-08-28 16:08:08 +01:00 |
Ed Addario
|
66aff8fa1e
|
Add precise_lambda()
|
2025-08-28 16:06:42 +01:00 |
Ed Addario
|
8df1d00ae4
|
Add directional scaling
|
2025-08-28 16:04:28 +01:00 |
Ed Addario
|
04946114c9
|
Refactor epsilon into a function-wide variable
|
2025-08-28 16:01:03 +01:00 |
Ed Addario
|
4286690019
|
Minor comment update
|
2025-08-26 21:39:40 +01:00 |
Ed Addario
|
d4ac2106fb
|
Improve logging and some minor code refactoring
|
2025-08-24 13:39:10 +01:00 |
Ed Addario
|
61c0e01f50
|
Execute bpw_overrides() only if an imatrix file is provided
|
2025-08-24 13:36:03 +01:00 |
Ed Addario
|
3856d60328
|
Restrict quant types per family
|
2025-08-23 14:45:07 +01:00 |
Ed Addario
|
decafae270
|
Adjust bias_lambda
|
2025-08-23 11:30:11 +01:00 |
Ed Addario
|
68ae5e66ce
|
Improve list of candidate types
|
2025-08-23 02:50:55 +01:00 |
Ed Addario
|
73124a9921
|
Refactor estimate_error()
|
2025-08-23 02:17:22 +01:00 |
Ed Addario
|
f75265f55b
|
Fix typo
|
2025-08-23 01:08:37 +01:00 |
Ed Addario
|
9a4b115497
|
Explicitly adding <atomic> include
|
2025-08-23 01:08:01 +01:00 |
Ed Addario
|
6d17889add
|
Log if override is from tensor-type or from bpw-target
|
2025-08-22 16:58:46 +01:00 |
Ed Addario
|
fea99d051a
|
Refactor and combine lambdas
|
2025-08-22 16:57:58 +01:00 |
Ed Addario
|
f05c8483d8
|
Improve dequantized_buffer fill
|
2025-08-22 09:17:58 +01:00 |
Ed Addario
|
897decbe8a
|
Show skipped IQ tensors
|
2025-08-22 09:15:11 +01:00 |
Ed Addario
|
01c927fb94
|
Improve pareto efficient candidate selection
|
2025-08-22 09:14:14 +01:00 |
Ed Addario
|
47cdbe2155
|
Reduce sampling window to speedup process
|
2025-08-22 09:11:11 +01:00 |
Ed Addario
|
2f13fee795
|
Parameterise type
|
2025-08-22 09:05:55 +01:00 |
Ed Addario
|
bb0d912c1f
|
Update comments
|
2025-08-22 09:02:56 +01:00 |
Ed Addario
|
35c1504441
|
Fix byte count for 3d or higher tensors
|
2025-08-22 09:01:57 +01:00 |
Ed Addario
|
ec0afbe79f
|
Include embeddings and output tensors
|
2025-08-22 01:46:09 +01:00 |
Ed Addario
|
5b6f1e9fde
|
General code refactor
|
2025-08-21 19:18:54 +01:00 |
Ed Addario
|
9e11f82e8f
|
Precompute error denominator in estimate_erro()
|
2025-08-21 16:25:31 +01:00 |
Ed Addario
|
887490c5ec
|
Dequantise sampled rows only
|
2025-08-21 15:11:49 +01:00 |
Ed Addario
|
e01dad886b
|
Parallelise candidate evaluation
|
2025-08-21 12:47:13 +01:00 |
Ed Addario
|
95b2ab2800
|
Change error estimate to use normalised weighted MSE
|
2025-08-21 10:46:37 +01:00 |
Ed Addario
|
5ef493ea1a
|
Exclude embeddings and output tensor
|
2025-08-21 09:48:29 +01:00 |
Ed Addario
|
35ad0fc4ad
|
Improve error estimation using weighted MSE
|
2025-08-20 23:27:20 +01:00 |
Ed Addario
|
b0b33b7ccb
|
Optimise tensor sampling
|
2025-08-20 20:58:26 +01:00 |
Ed Addario
|
3f0118d602
|
Fix bias lambda bug
|
2025-08-20 17:26:37 +01:00 |
Ed Addario
|
52da4a4f8c
|
Skip if output.weight or type is COPY
|
2025-08-20 17:26:05 +01:00 |
Ed Addario
|
43caadf783
|
Add better fallbacks for IQ mixes
|
2025-08-20 17:24:48 +01:00 |
Ed Addario
|
29b2dc3ec0
|
Do not mix K and IQ quants
|
2025-08-20 13:27:01 +01:00 |
Ed Addario
|
5cd69a6809
|
Add F16/BF16 type
|
2025-08-20 09:41:39 +01:00 |
Ed Addario
|
936294f6af
|
Increase precision for error calculation
|
2025-08-19 23:31:22 +01:00 |
Ed Addario
|
f22b3097eb
|
Avoid division by zero if truncation occurs
|
2025-08-19 22:34:01 +01:00 |
Ed Addario
|
ee05d6bc0b
|
Update comments
|
2025-08-19 22:32:53 +01:00 |
Ed Addario
|
5aceb9e3ae
|
Refactor variable names
|
2025-08-19 22:29:27 +01:00 |
Ed Addario
|
1187f6aa9e
|
Implement bpw_overrides call
|
2025-08-19 11:07:03 +01:00 |
Ed Addario
|
92f49ab399
|
Add target_bpw_type() logic
|
2025-08-19 11:05:01 +01:00 |
Ed Addario
|
017945a3b2
|
Validate if imatrix contains activations
|
2025-08-19 11:03:52 +01:00 |