Ed Addario
fa1df81d49
Finetune heuristics
2025-10-20 20:52:23 +01:00
Ed Addario
41a0069613
Merge branch 'master' into quantize
2025-10-16 22:20:04 +01:00
Ed Addario
a5103933bb
Minor refactoring
2025-10-16 15:11:48 +01:00
Ed Addario
0b3e930d52
Add option to override bpw state file name
2025-10-16 11:41:26 +01:00
Ed Addario
a6853ea2ae
Add tensor type and depth heuristics
2025-10-16 11:20:24 +01:00
Xuan-Son Nguyen
3e3cb19f64
llama-quant: add support for mmproj ( #16592 )
...
* llama-quant: add support for mmproj
* Update src/llama.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* check prefix instead
* small fix
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-10-15 14:48:08 +02:00
Ed Addario
b7911f1431
Minor refactoring
2025-10-13 17:46:45 +01:00
Ed Addario
cd734b89ce
Update quant types
2025-10-13 15:15:23 +01:00
Ed Addario
b1b58e67df
Refactor signal handlers
2025-10-13 14:54:32 +01:00
Ed Addario
ca282302b5
Add --keep-bpw-state option
2025-10-12 18:23:23 +01:00
Ed Addario
b6094a97bf
Add quant types
2025-10-12 16:30:35 +01:00
Ed Addario
12e0524f3a
Reduce compute time by parallelising tensor processing - courtesy of https://github.com/ddh0
2025-10-12 15:12:15 +01:00
Ed Addario
5b0d3f6d5a
Automatically determine if bias error is significant
2025-10-11 10:04:48 +01:00
Ed Addario
c93131cef6
Remove --no-bias option
2025-10-10 13:26:51 +01:00
Ed Addario
3a3d807fc3
Remove bias mode computation
2025-10-10 13:10:42 +01:00
Ed Addario
c11184a3c1
Generate model ID hash
2025-10-09 11:58:01 +01:00
Ed Addario
044fa783c7
Fix trimming logic
2025-10-06 21:40:37 +01:00
Ed Addario
84ada44894
Uninstall signal handler and cleanup
2025-10-05 20:20:56 +01:00
Ed Addario
46706cec28
Persist progress
2025-10-05 20:20:28 +01:00
Ed Addario
74c62ed4e6
Add delete_bpw_state()
2025-10-05 20:19:03 +01:00
Ed Addario
02c3073b81
Add load_bpw_state()
2025-10-05 20:18:36 +01:00
Ed Addario
e48ca32f19
Add save_bpw_state()
2025-10-05 20:17:27 +01:00
Ed Addario
533cda3076
Add signal handler
2025-10-05 20:16:33 +01:00
Ed Addario
560e8c9d70
Relax lambda clamping
2025-10-05 14:41:42 +01:00
Ed Addario
f5d8811ddd
Prioritise important tensors
2025-10-01 19:04:43 +01:00
Ed Addario
b3b8a111a5
Compute rows based on tensor shape and slice count
2025-09-28 18:45:25 +01:00
Ed Addario
e49e241d37
Calculate bpw over all tensors
2025-09-27 17:28:39 +01:00
Ed Addario
3d75b14c0f
Simplify dequantisation
2025-09-27 17:27:58 +01:00
Ed Addario
8a2c71f471
Check for direction reversal
2025-09-27 17:27:29 +01:00
Ed Addario
87cba65908
Tighten worker allocator
2025-09-27 17:26:30 +01:00
Ed Addario
d16945730e
Refactor outlier trimming
2025-09-27 17:25:29 +01:00
Ed Addario
dd4f4bd0b8
Reduce bpw range
2025-09-27 17:23:48 +01:00
Ed Addario
dbdd179a92
Combine quant types
2025-09-25 19:50:20 +01:00
Ed Addario
a74b410f5f
Move is_iq() into a lambda and remove unused variables
2025-09-25 19:49:47 +01:00
Ed Addario
8eedcf74bc
Increase scale multiplier
2025-09-22 20:42:37 +01:00
Ed Addario
d36ee0a0a8
Add comments to explain magic numbers
2025-09-22 20:41:56 +01:00
Ed Addario
7ba6001ec8
Simplify candidates sorting
2025-09-22 20:11:54 +01:00
Ed Addario
d79ade2e8e
Adjust for small vector size
2025-09-22 20:11:26 +01:00
Ed Addario
f184450806
Fix minor logic flaw
2025-09-22 20:10:42 +01:00
Ed Addario
1fbc59f867
Replace slope with cross product
2025-09-22 20:10:10 +01:00
Ed Addario
c855094dff
Exit loop if no better solution found
2025-09-22 20:09:11 +01:00
Ed Addario
b748a1efa7
Fix typo
2025-09-21 22:03:54 +01:00
Ed Addario
896cdc2121
Refactor potential overflow
2025-09-21 22:03:36 +01:00
Ed Addario
fecc472c61
Fix typos in variable names
2025-09-21 17:26:38 +01:00
Ed Addario
e92db008bc
Refactor quantisation checks into its own function
2025-09-21 17:20:48 +01:00
Ed Addario
814f6b66be
Minor general refactoring
2025-09-21 16:45:09 +01:00
Ed Addario
0d5f18303e
Refactor lagrange_penalty()
2025-09-21 16:22:00 +01:00
Ed Addario
9a1656eb97
Refactor pareto optimise and convexify
2025-09-21 16:21:35 +01:00
Ed Addario
1a3e9ea4c8
Refactor estimate_error()
2025-09-21 16:21:00 +01:00
Ed Addario
a7ee915e19
Refactor trimmed_sum()
2025-09-21 16:20:06 +01:00
Ed Addario
b09662f86a
Refactor estimate_lambda()
2025-09-21 16:19:49 +01:00
Ed Addario
17be7615ce
Refactor candidate types build
2025-09-21 16:19:28 +01:00
Ed Addario
08146fd67f
Refactor side_data() and copy_or_broadcast()
2025-09-21 16:19:03 +01:00
Ed Addario
7386d4eadd
Refactor row sampling
2025-09-21 16:18:26 +01:00
Ed Addario
b6c008fd8a
Refactor helper lambdas
2025-09-21 16:04:13 +01:00
Ed Addario
b433fd9547
Refactor last budget pass
2025-09-21 13:43:09 +01:00
Ed Addario
c466c53808
Refactor pareto pruning and convexification
2025-09-21 13:42:54 +01:00
Ed Addario
6b8cedf3bc
Refactor estimate_lambda()
2025-09-21 13:42:31 +01:00
Ed Addario
bdefdb673c
Refactor copy_or_broadcast()
2025-09-21 13:42:07 +01:00
Ed Addario
e8e2aed17a
Refactor row sampling
2025-09-21 13:41:44 +01:00
Ed Addario
9e74f83411
Replace --bpw-bias flag with --no-bias
2025-09-20 23:06:37 +01:00
Ed Addario
ab02bb1f3e
Merge branch 'master' into quantize
2025-09-20 21:41:25 +01:00
Ed Addario
a36946997e
Replace fast_bias() for per slice version and remove precise_bias()
2025-09-20 21:36:54 +01:00
Ed Addario
14fae69a7b
General refactoring
2025-09-20 21:31:31 +01:00
Jie Fu (傅杰)
745cbcf2fe
llama-quant : fix the verification of attention layers for encoder-decoder models ( #16023 )
...
Signed-off-by: Jie Fu <jiefu@tencent.com>
2025-09-17 09:30:55 +02:00
Ed Addario
ad70fca5b2
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize
2025-09-15 07:42:37 +01:00
Ed Addario
9b857e3984
Merge branch 'ggml-org:master' into quantize
2025-09-14 23:35:43 +01:00
Ed Addario
c709e1a335
Fix MoE tensor estimation
2025-09-14 22:38:27 +01:00
Ed Addario
8503d59ee4
Increase IQ options
2025-09-13 11:49:18 +01:00
Ed Addario
2b516068e2
"Convexify" candidate list
2025-09-13 09:41:52 +01:00
Ed Addario
12e816b511
Replace greedy allocator with lagrangian relaxation
2025-09-13 09:24:23 +01:00
Ed Addario
7d85993f26
Minor refactoring
2025-09-13 08:44:41 +01:00
Ed Addario
4dff85fbe5
Improve precise_lambda() efficiency
2025-09-13 08:41:37 +01:00
Ed Addario
bc8762f27f
Capture surrounding function name
2025-09-13 08:33:22 +01:00
Ed Addario
886536d80a
Increase error type precision
2025-09-13 08:27:23 +01:00
ddh0
df082f5630
nitpick : correct MB to MiB ( #15934 )
...
MB was incorrectly used for 1024 x 1024 bytes instead of MiB
2025-09-11 19:12:34 +02:00
Ed Addario
04c07b3272
Add better control over MSE and directional bias computation
2025-09-10 18:00:56 +01:00
Ed Addario
eab8708244
Minor factoring for efficiency and correctness
2025-08-30 10:14:46 +01:00
Ed Addario
556f6b04fe
Add --precise-lambda option
2025-08-28 16:08:08 +01:00
Ed Addario
66aff8fa1e
Add precise_lambda()
2025-08-28 16:06:42 +01:00
Ed Addario
8df1d00ae4
Add directional scaling
2025-08-28 16:04:28 +01:00
Ed Addario
04946114c9
Refactor epsilon into a function-wide variable
2025-08-28 16:01:03 +01:00
Ed Addario
4286690019
Minor comment update
2025-08-26 21:39:40 +01:00
Ed Addario
d4ac2106fb
Improve logging and some minor code refactoring
2025-08-24 13:39:10 +01:00
Ed Addario
61c0e01f50
Execute bpw_overrides() only if an imatrix file is provided
2025-08-24 13:36:03 +01:00
Ed Addario
3856d60328
Restrict quant types per family
2025-08-23 14:45:07 +01:00
Ed Addario
decafae270
Adjust bias_lambda
2025-08-23 11:30:11 +01:00
Ed Addario
68ae5e66ce
Improve list of candidate types
2025-08-23 02:50:55 +01:00
Ed Addario
73124a9921
Refactor estimate_error()
2025-08-23 02:17:22 +01:00
Ed Addario
f75265f55b
Fix typo
2025-08-23 01:08:37 +01:00
Ed Addario
9a4b115497
Explicitly adding <atomic> include
2025-08-23 01:08:01 +01:00
Ed Addario
6d17889add
Log if override is from tensor-type or from bpw-target
2025-08-22 16:58:46 +01:00
Ed Addario
fea99d051a
Refactor and combine lambdas
2025-08-22 16:57:58 +01:00
Ed Addario
f05c8483d8
Improve dequantized_buffer fill
2025-08-22 09:17:58 +01:00
Ed Addario
897decbe8a
Show skipped IQ tensors
2025-08-22 09:15:11 +01:00
Ed Addario
01c927fb94
Improve pareto efficient candidate selection
2025-08-22 09:14:14 +01:00
Ed Addario
47cdbe2155
Reduce sampling window to speedup process
2025-08-22 09:11:11 +01:00
Ed Addario
2f13fee795
Parameterise type
2025-08-22 09:05:55 +01:00
Ed Addario
bb0d912c1f
Update comments
2025-08-22 09:02:56 +01:00
Ed Addario
35c1504441
Fix byte count for 3d or higher tensors
2025-08-22 09:01:57 +01:00