Ed Addario
|
08146fd67f
|
Refactor side_data() and copy_or_broadcast()
|
2025-09-21 16:19:03 +01:00 |
Ed Addario
|
7386d4eadd
|
Refactor row sampling
|
2025-09-21 16:18:26 +01:00 |
Ed Addario
|
b6c008fd8a
|
Refactor helper lambdas
|
2025-09-21 16:04:13 +01:00 |
Ed Addario
|
b433fd9547
|
Refactor last budget pass
|
2025-09-21 13:43:09 +01:00 |
Ed Addario
|
c466c53808
|
Refactor pareto pruning and convexification
|
2025-09-21 13:42:54 +01:00 |
Ed Addario
|
6b8cedf3bc
|
Refactor estimate_lambda()
|
2025-09-21 13:42:31 +01:00 |
Ed Addario
|
bdefdb673c
|
Refactor copy_or_broadcast()
|
2025-09-21 13:42:07 +01:00 |
Ed Addario
|
e8e2aed17a
|
Refactor row sampling
|
2025-09-21 13:41:44 +01:00 |
Ed Addario
|
9e74f83411
|
Replace --bpw-bias flag with --no-bias
|
2025-09-20 23:06:37 +01:00 |
Ed Addario
|
ab02bb1f3e
|
Merge branch 'master' into quantize
|
2025-09-20 21:41:25 +01:00 |
Ed Addario
|
a36946997e
|
Replace fast_bias() for per slice version and remove precise_bias()
|
2025-09-20 21:36:54 +01:00 |
Ed Addario
|
14fae69a7b
|
General refactoring
|
2025-09-20 21:31:31 +01:00 |
Jie Fu (傅杰)
|
745cbcf2fe
|
llama-quant : fix the verification of attention layers for encoder-decoder models (#16023)
Signed-off-by: Jie Fu <jiefu@tencent.com>
|
2025-09-17 09:30:55 +02:00 |
Ed Addario
|
ad70fca5b2
|
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize
|
2025-09-15 07:42:37 +01:00 |
Ed Addario
|
9b857e3984
|
Merge branch 'ggml-org:master' into quantize
|
2025-09-14 23:35:43 +01:00 |
Ed Addario
|
c709e1a335
|
Fix MoE tensor estimation
|
2025-09-14 22:38:27 +01:00 |
Ed Addario
|
8503d59ee4
|
Increase IQ options
|
2025-09-13 11:49:18 +01:00 |
Ed Addario
|
2b516068e2
|
"Convexify" candidate list
|
2025-09-13 09:41:52 +01:00 |
Ed Addario
|
12e816b511
|
Replace greedy allocator with lagrangian relaxation
|
2025-09-13 09:24:23 +01:00 |
Ed Addario
|
7d85993f26
|
Minor refactoring
|
2025-09-13 08:44:41 +01:00 |
Ed Addario
|
4dff85fbe5
|
Improve precise_lambda() efficiency
|
2025-09-13 08:41:37 +01:00 |
Ed Addario
|
bc8762f27f
|
Capture surrounding function name
|
2025-09-13 08:33:22 +01:00 |
Ed Addario
|
886536d80a
|
Increase error type precision
|
2025-09-13 08:27:23 +01:00 |
ddh0
|
df082f5630
|
nitpick : correct MB to MiB (#15934)
MB was incorrectly used for 1024 x 1024 bytes instead of MiB
|
2025-09-11 19:12:34 +02:00 |
Ed Addario
|
04c07b3272
|
Add better control over MSE and directional bias computation
|
2025-09-10 18:00:56 +01:00 |
Ed Addario
|
eab8708244
|
Minor factoring for efficiency and correctness
|
2025-08-30 10:14:46 +01:00 |
Ed Addario
|
556f6b04fe
|
Add --precise-lambda option
|
2025-08-28 16:08:08 +01:00 |
Ed Addario
|
66aff8fa1e
|
Add precise_lambda()
|
2025-08-28 16:06:42 +01:00 |
Ed Addario
|
8df1d00ae4
|
Add directional scaling
|
2025-08-28 16:04:28 +01:00 |
Ed Addario
|
04946114c9
|
Refactor epsilon into a function-wide variable
|
2025-08-28 16:01:03 +01:00 |
Ed Addario
|
4286690019
|
Minor comment update
|
2025-08-26 21:39:40 +01:00 |
Ed Addario
|
d4ac2106fb
|
Improve logging and some minor code refactoring
|
2025-08-24 13:39:10 +01:00 |
Ed Addario
|
61c0e01f50
|
Execute bpw_overrides() only if an imatrix file is provided
|
2025-08-24 13:36:03 +01:00 |
Ed Addario
|
3856d60328
|
Restrict quant types per family
|
2025-08-23 14:45:07 +01:00 |
Ed Addario
|
decafae270
|
Adjust bias_lambda
|
2025-08-23 11:30:11 +01:00 |
Ed Addario
|
68ae5e66ce
|
Improve list of candidate types
|
2025-08-23 02:50:55 +01:00 |
Ed Addario
|
73124a9921
|
Refactor estimate_error()
|
2025-08-23 02:17:22 +01:00 |
Ed Addario
|
f75265f55b
|
Fix typo
|
2025-08-23 01:08:37 +01:00 |
Ed Addario
|
9a4b115497
|
Explicitly adding <atomic> include
|
2025-08-23 01:08:01 +01:00 |
Ed Addario
|
6d17889add
|
Log if override is from tensor-type or from bpw-target
|
2025-08-22 16:58:46 +01:00 |
Ed Addario
|
fea99d051a
|
Refactor and combine lambdas
|
2025-08-22 16:57:58 +01:00 |
Ed Addario
|
f05c8483d8
|
Improve dequantized_buffer fill
|
2025-08-22 09:17:58 +01:00 |
Ed Addario
|
897decbe8a
|
Show skipped IQ tensors
|
2025-08-22 09:15:11 +01:00 |
Ed Addario
|
01c927fb94
|
Improve pareto efficient candidate selection
|
2025-08-22 09:14:14 +01:00 |
Ed Addario
|
47cdbe2155
|
Reduce sampling window to speedup process
|
2025-08-22 09:11:11 +01:00 |
Ed Addario
|
2f13fee795
|
Parameterise type
|
2025-08-22 09:05:55 +01:00 |
Ed Addario
|
bb0d912c1f
|
Update comments
|
2025-08-22 09:02:56 +01:00 |
Ed Addario
|
35c1504441
|
Fix byte count for 3d or higher tensors
|
2025-08-22 09:01:57 +01:00 |
Ed Addario
|
ec0afbe79f
|
Include embeddings and output tensors
|
2025-08-22 01:46:09 +01:00 |
Ed Addario
|
5b6f1e9fde
|
General code refactor
|
2025-08-21 19:18:54 +01:00 |