Ed Addario
|
a74b410f5f
|
Move is_iq() into a lambda and remove unused variables
|
2025-09-25 19:49:47 +01:00 |
Ed Addario
|
8eedcf74bc
|
Increase scale multiplier
|
2025-09-22 20:42:37 +01:00 |
Ed Addario
|
d36ee0a0a8
|
Add comments to explain magic numbers
|
2025-09-22 20:41:56 +01:00 |
Ed Addario
|
7ba6001ec8
|
Simplify candidates sorting
|
2025-09-22 20:11:54 +01:00 |
Ed Addario
|
d79ade2e8e
|
Adjust for small vector size
|
2025-09-22 20:11:26 +01:00 |
Ed Addario
|
f184450806
|
Fix minor logic flaw
|
2025-09-22 20:10:42 +01:00 |
Ed Addario
|
1fbc59f867
|
Replace slope with cross product
|
2025-09-22 20:10:10 +01:00 |
Ed Addario
|
c855094dff
|
Exit loop if no better solution found
|
2025-09-22 20:09:11 +01:00 |
Ed Addario
|
b748a1efa7
|
Fix typo
|
2025-09-21 22:03:54 +01:00 |
Ed Addario
|
896cdc2121
|
Refactor potential overflow
|
2025-09-21 22:03:36 +01:00 |
Ed Addario
|
fecc472c61
|
Fix typos in variable names
|
2025-09-21 17:26:38 +01:00 |
Ed Addario
|
e92db008bc
|
Refactor quantisation checks into its own function
|
2025-09-21 17:20:48 +01:00 |
Ed Addario
|
814f6b66be
|
Minor general refactoring
|
2025-09-21 16:45:09 +01:00 |
Ed Addario
|
0d5f18303e
|
Refactor lagrange_penalty()
|
2025-09-21 16:22:00 +01:00 |
Ed Addario
|
9a1656eb97
|
Refactor pareto optimise and convexify
|
2025-09-21 16:21:35 +01:00 |
Ed Addario
|
1a3e9ea4c8
|
Refactor estimate_error()
|
2025-09-21 16:21:00 +01:00 |
Ed Addario
|
a7ee915e19
|
Refactor trimmed_sum()
|
2025-09-21 16:20:06 +01:00 |
Ed Addario
|
b09662f86a
|
Refactor estimate_lambda()
|
2025-09-21 16:19:49 +01:00 |
Ed Addario
|
17be7615ce
|
Refactor candidate types build
|
2025-09-21 16:19:28 +01:00 |
Ed Addario
|
08146fd67f
|
Refactor side_data() and copy_or_broadcast()
|
2025-09-21 16:19:03 +01:00 |
Ed Addario
|
7386d4eadd
|
Refactor row sampling
|
2025-09-21 16:18:26 +01:00 |
Ed Addario
|
b6c008fd8a
|
Refactor helper lambdas
|
2025-09-21 16:04:13 +01:00 |
Ed Addario
|
b433fd9547
|
Refactor last budget pass
|
2025-09-21 13:43:09 +01:00 |
Ed Addario
|
c466c53808
|
Refactor pareto pruning and convexification
|
2025-09-21 13:42:54 +01:00 |
Ed Addario
|
6b8cedf3bc
|
Refactor estimate_lambda()
|
2025-09-21 13:42:31 +01:00 |
Ed Addario
|
bdefdb673c
|
Refactor copy_or_broadcast()
|
2025-09-21 13:42:07 +01:00 |
Ed Addario
|
e8e2aed17a
|
Refactor row sampling
|
2025-09-21 13:41:44 +01:00 |
Ed Addario
|
9e74f83411
|
Replace --bpw-bias flag with --no-bias
|
2025-09-20 23:06:37 +01:00 |
Ed Addario
|
ab02bb1f3e
|
Merge branch 'master' into quantize
|
2025-09-20 21:41:25 +01:00 |
Ed Addario
|
a36946997e
|
Replace fast_bias() for per slice version and remove precise_bias()
|
2025-09-20 21:36:54 +01:00 |
Ed Addario
|
14fae69a7b
|
General refactoring
|
2025-09-20 21:31:31 +01:00 |
Jie Fu (傅杰)
|
745cbcf2fe
|
llama-quant : fix the verification of attention layers for encoder-decoder models (#16023)
Signed-off-by: Jie Fu <jiefu@tencent.com>
|
2025-09-17 09:30:55 +02:00 |
Ed Addario
|
ad70fca5b2
|
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize
|
2025-09-15 07:42:37 +01:00 |
Ed Addario
|
9b857e3984
|
Merge branch 'ggml-org:master' into quantize
|
2025-09-14 23:35:43 +01:00 |
Ed Addario
|
c709e1a335
|
Fix MoE tensor estimation
|
2025-09-14 22:38:27 +01:00 |
Ed Addario
|
8503d59ee4
|
Increase IQ options
|
2025-09-13 11:49:18 +01:00 |
Ed Addario
|
2b516068e2
|
"Convexify" candidate list
|
2025-09-13 09:41:52 +01:00 |
Ed Addario
|
12e816b511
|
Replace greedy allocator with lagrangian relaxation
|
2025-09-13 09:24:23 +01:00 |
Ed Addario
|
7d85993f26
|
Minor refactoring
|
2025-09-13 08:44:41 +01:00 |
Ed Addario
|
4dff85fbe5
|
Improve precise_lambda() efficiency
|
2025-09-13 08:41:37 +01:00 |
Ed Addario
|
bc8762f27f
|
Capture surrounding function name
|
2025-09-13 08:33:22 +01:00 |
Ed Addario
|
886536d80a
|
Increase error type precision
|
2025-09-13 08:27:23 +01:00 |
ddh0
|
df082f5630
|
nitpick : correct MB to MiB (#15934)
MB was incorrectly used for 1024 x 1024 bytes instead of MiB
|
2025-09-11 19:12:34 +02:00 |
Ed Addario
|
04c07b3272
|
Add better control over MSE and directional bias computation
|
2025-09-10 18:00:56 +01:00 |
Ed Addario
|
eab8708244
|
Minor factoring for efficiency and correctness
|
2025-08-30 10:14:46 +01:00 |
Ed Addario
|
556f6b04fe
|
Add --precise-lambda option
|
2025-08-28 16:08:08 +01:00 |
Ed Addario
|
66aff8fa1e
|
Add precise_lambda()
|
2025-08-28 16:06:42 +01:00 |
Ed Addario
|
8df1d00ae4
|
Add directional scaling
|
2025-08-28 16:04:28 +01:00 |
Ed Addario
|
04946114c9
|
Refactor epsilon into a function-wide variable
|
2025-08-28 16:01:03 +01:00 |
Ed Addario
|
4286690019
|
Minor comment update
|
2025-08-26 21:39:40 +01:00 |
Ed Addario
|
d4ac2106fb
|
Improve logging and some minor code refactoring
|
2025-08-24 13:39:10 +01:00 |
Ed Addario
|
61c0e01f50
|
Execute bpw_overrides() only if an imatrix file is provided
|
2025-08-24 13:36:03 +01:00 |
Ed Addario
|
3856d60328
|
Restrict quant types per family
|
2025-08-23 14:45:07 +01:00 |
Ed Addario
|
decafae270
|
Adjust bias_lambda
|
2025-08-23 11:30:11 +01:00 |
Ed Addario
|
68ae5e66ce
|
Improve list of candidate types
|
2025-08-23 02:50:55 +01:00 |
Ed Addario
|
73124a9921
|
Refactor estimate_error()
|
2025-08-23 02:17:22 +01:00 |
Ed Addario
|
f75265f55b
|
Fix typo
|
2025-08-23 01:08:37 +01:00 |
Ed Addario
|
9a4b115497
|
Explicitly adding <atomic> include
|
2025-08-23 01:08:01 +01:00 |
Ed Addario
|
6d17889add
|
Log if override is from tensor-type or from bpw-target
|
2025-08-22 16:58:46 +01:00 |
Ed Addario
|
fea99d051a
|
Refactor and combine lambdas
|
2025-08-22 16:57:58 +01:00 |
Ed Addario
|
f05c8483d8
|
Improve dequantized_buffer fill
|
2025-08-22 09:17:58 +01:00 |
Ed Addario
|
897decbe8a
|
Show skipped IQ tensors
|
2025-08-22 09:15:11 +01:00 |
Ed Addario
|
01c927fb94
|
Improve pareto efficient candidate selection
|
2025-08-22 09:14:14 +01:00 |
Ed Addario
|
47cdbe2155
|
Reduce sampling window to speedup process
|
2025-08-22 09:11:11 +01:00 |
Ed Addario
|
2f13fee795
|
Parameterise type
|
2025-08-22 09:05:55 +01:00 |
Ed Addario
|
bb0d912c1f
|
Update comments
|
2025-08-22 09:02:56 +01:00 |
Ed Addario
|
35c1504441
|
Fix byte count for 3d or higher tensors
|
2025-08-22 09:01:57 +01:00 |
Ed Addario
|
ec0afbe79f
|
Include embeddings and output tensors
|
2025-08-22 01:46:09 +01:00 |
Ed Addario
|
5b6f1e9fde
|
General code refactor
|
2025-08-21 19:18:54 +01:00 |
Ed Addario
|
9e11f82e8f
|
Precompute error denominator in estimate_erro()
|
2025-08-21 16:25:31 +01:00 |
Ed Addario
|
887490c5ec
|
Dequantise sampled rows only
|
2025-08-21 15:11:49 +01:00 |
Ed Addario
|
e01dad886b
|
Parallelise candidate evaluation
|
2025-08-21 12:47:13 +01:00 |
Ed Addario
|
95b2ab2800
|
Change error estimate to use normalised weighted MSE
|
2025-08-21 10:46:37 +01:00 |
Ed Addario
|
5ef493ea1a
|
Exclude embeddings and output tensor
|
2025-08-21 09:48:29 +01:00 |
Ed Addario
|
35ad0fc4ad
|
Improve error estimation using weighted MSE
|
2025-08-20 23:27:20 +01:00 |
Ed Addario
|
b0b33b7ccb
|
Optimise tensor sampling
|
2025-08-20 20:58:26 +01:00 |
Ed Addario
|
3f0118d602
|
Fix bias lambda bug
|
2025-08-20 17:26:37 +01:00 |
Ed Addario
|
52da4a4f8c
|
Skip if output.weight or type is COPY
|
2025-08-20 17:26:05 +01:00 |
Ed Addario
|
43caadf783
|
Add better fallbacks for IQ mixes
|
2025-08-20 17:24:48 +01:00 |
Ed Addario
|
29b2dc3ec0
|
Do not mix K and IQ quants
|
2025-08-20 13:27:01 +01:00 |
Ed Addario
|
5cd69a6809
|
Add F16/BF16 type
|
2025-08-20 09:41:39 +01:00 |
Ed Addario
|
936294f6af
|
Increase precision for error calculation
|
2025-08-19 23:31:22 +01:00 |
Ed Addario
|
f22b3097eb
|
Avoid division by zero if truncation occurs
|
2025-08-19 22:34:01 +01:00 |
Ed Addario
|
ee05d6bc0b
|
Update comments
|
2025-08-19 22:32:53 +01:00 |
Ed Addario
|
5aceb9e3ae
|
Refactor variable names
|
2025-08-19 22:29:27 +01:00 |
Ed Addario
|
1187f6aa9e
|
Implement bpw_overrides call
|
2025-08-19 11:07:03 +01:00 |
Ed Addario
|
92f49ab399
|
Add target_bpw_type() logic
|
2025-08-19 11:05:01 +01:00 |
Ed Addario
|
017945a3b2
|
Validate if imatrix contains activations
|
2025-08-19 11:03:52 +01:00 |
Ed Addario
|
9adae08789
|
Add is_iq()
|
2025-08-19 11:00:50 +01:00 |
Ed Addario
|
c96b8eef94
|
Add fallback_type enum
|
2025-08-19 11:00:05 +01:00 |
Ed Addario
|
a22a9deeee
|
Refactor variable and add target_bpw
|
2025-08-19 10:57:44 +01:00 |
Xuan-Son Nguyen
|
50aa938901
|
convert : support non-mxfp4 HF model (#15153)
* convert : support non-mxfp4 HF model
* rm redundant check
* disable debug check
|
2025-08-07 23:26:03 +02:00 |
Georgi Gerganov
|
fd1234cb46
|
llama : add gpt-oss (#15091)
* oai moe
* compat with new checkpoint
* add attn sink impl
* add rope scaling yarn
* logits match with latest transformers code
* wip chat template
* rm trailing space
* use ggml_scale_bias
* rm redundant is_swa_all
* convert interleaved gate_up
* graph : fix activation function to match reference (#7)
* vocab : handle o200k_harmony special tokens
* ggml : add attention sinks support (#1)
* llama : add attn sinks
* ggml : add attn sinks
* cuda : add attn sinks
* vulkan : add support for sinks in softmax
remove unnecessary return
* ggml : add fused swiglu_oai op (#11)
* ggml : add fused swiglu_oai op
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* update CUDA impl
* cont : metal impl
* add vulkan impl
* test-backend-ops : more test cases, clean up
* llama : remove unfused impl
* remove extra lines
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
* repack mxfp4 upon conversion
* clean up a bit
* enable thinking
* add quick hack to render only some special tokens
* fix bf16 conversion
* remove vocab hack
* webui ok
* support chat parsing for gpt-oss
* fix webui
* direct mapping mxfp4, FINALLY
* force using mxfp4
* properly use lazy tensor
* ggml : add mxfp4
ggml : use e8m0 conversion instead of powf
Co-authored-by: Diego Devesa <slarengh@gmail.com>
change kvalues_mxfp4 table to match e2m1 (#6)
metal : remove quantization for now (not used)
cuda : fix disabled CUDA graphs due to ffn moe bias
vulkan : add support for mxfp4
cont : add cm2 dequant
* ggml : add ggml_add_id (#13)
* ggml : add ggml_add_id
* add cuda impl
* llama : add weight support check for add_id
* perf opt
* add vulkan impl
* rename cuda files
* add metal impl
* allow in-place ggml_add_id
* llama : keep biases on CPU with --cpu-moe
* llama : fix compile error
ggml-ci
* cuda : add fallback for __nv_cvt_e8m0_to_bf16raw
ggml-ci
* cleanup
ggml-ci
* sycl : fix supports_op for MXFP4
ggml-ci
* fix Unknown reasoning format
* ggml-cpu : fix AVX build
ggml-ci
* fix hip build
ggml-ci
* cuda : add mxfp4 dequantization support for cuBLAS
ggml-ci
* ggml-cpu : fix mxfp4 fallback definitions for some architectures
ggml-ci
* cuda : fix version required for __nv_cvt_e8m0_to_bf16raw
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: slaren <slarengh@gmail.com>
|
2025-08-05 22:10:36 +03:00 |
Ed Addario
|
daf2dd7880
|
quantize : skip tensor override when in fallback mode (#14995)
|
2025-07-31 21:32:18 +02:00 |
Ed Addario
|
982e347255
|
quantize : fix minor logic flaw in --tensor-type (#14572)
|
2025-07-13 18:02:17 +02:00 |
Tarek Dakhran
|
f5e96b368f
|
model : support LiquidAI LFM2 hybrid family (#14620)
**Important**
LFM2 was [merged ](https://github.com/huggingface/transformers/pull/39340)into transformers, but has not yet been released.
To convert into gguf, install transformers from source
```shell
pip install "transformers @ git+https://github.com/huggingface/transformers.git@main"
```
|
2025-07-11 20:27:01 +02:00 |
Xuan-Son Nguyen
|
8846aace49
|
model : gemma3n text-only (#14400)
* gemma3n
* add llm_graph_input_one
|
2025-06-26 20:34:02 +03:00 |
Ed Addario
|
fa4a9f2a1c
|
quantize : handle user-defined pruning of whole layers (blocks) (#13037)
|
2025-06-22 23:16:26 +02:00 |
Ed Addario
|
30e5b01de2
|
quantize : change int to unsigned int for KV overrides (#14197)
|
2025-06-15 18:53:45 +02:00 |
Ed Addario
|
e5c834f718
|
quantize : improve tensor-type pattern matching (#13033)
|
2025-05-13 19:12:31 +02:00 |