Ed Addario
|
d4ac2106fb
|
Improve logging and some minor code refactoring
|
2025-08-24 13:39:10 +01:00 |
Ed Addario
|
61c0e01f50
|
Execute bpw_overrides() only if an imatrix file is provided
|
2025-08-24 13:36:03 +01:00 |
Ed Addario
|
3856d60328
|
Restrict quant types per family
|
2025-08-23 14:45:07 +01:00 |
Ed Addario
|
decafae270
|
Adjust bias_lambda
|
2025-08-23 11:30:11 +01:00 |
Ed Addario
|
68ae5e66ce
|
Improve list of candidate types
|
2025-08-23 02:50:55 +01:00 |
Ed Addario
|
73124a9921
|
Refactor estimate_error()
|
2025-08-23 02:17:22 +01:00 |
Ed Addario
|
f75265f55b
|
Fix typo
|
2025-08-23 01:08:37 +01:00 |
Ed Addario
|
9a4b115497
|
Explicitly adding <atomic> include
|
2025-08-23 01:08:01 +01:00 |
Ed Addario
|
6d17889add
|
Log if override is from tensor-type or from bpw-target
|
2025-08-22 16:58:46 +01:00 |
Ed Addario
|
fea99d051a
|
Refactor and combine lambdas
|
2025-08-22 16:57:58 +01:00 |
Ed Addario
|
f05c8483d8
|
Improve dequantized_buffer fill
|
2025-08-22 09:17:58 +01:00 |
Ed Addario
|
897decbe8a
|
Show skipped IQ tensors
|
2025-08-22 09:15:11 +01:00 |
Ed Addario
|
01c927fb94
|
Improve pareto efficient candidate selection
|
2025-08-22 09:14:14 +01:00 |
Ed Addario
|
47cdbe2155
|
Reduce sampling window to speedup process
|
2025-08-22 09:11:11 +01:00 |
Ed Addario
|
2f13fee795
|
Parameterise type
|
2025-08-22 09:05:55 +01:00 |
Ed Addario
|
bb0d912c1f
|
Update comments
|
2025-08-22 09:02:56 +01:00 |
Ed Addario
|
35c1504441
|
Fix byte count for 3d or higher tensors
|
2025-08-22 09:01:57 +01:00 |
Ed Addario
|
ec0afbe79f
|
Include embeddings and output tensors
|
2025-08-22 01:46:09 +01:00 |
Ed Addario
|
5b6f1e9fde
|
General code refactor
|
2025-08-21 19:18:54 +01:00 |
Ed Addario
|
9e11f82e8f
|
Precompute error denominator in estimate_erro()
|
2025-08-21 16:25:31 +01:00 |
Ed Addario
|
887490c5ec
|
Dequantise sampled rows only
|
2025-08-21 15:11:49 +01:00 |
Ed Addario
|
e01dad886b
|
Parallelise candidate evaluation
|
2025-08-21 12:47:13 +01:00 |
Ed Addario
|
95b2ab2800
|
Change error estimate to use normalised weighted MSE
|
2025-08-21 10:46:37 +01:00 |
Ed Addario
|
5ef493ea1a
|
Exclude embeddings and output tensor
|
2025-08-21 09:48:29 +01:00 |
Ed Addario
|
35ad0fc4ad
|
Improve error estimation using weighted MSE
|
2025-08-20 23:27:20 +01:00 |
Ed Addario
|
b0b33b7ccb
|
Optimise tensor sampling
|
2025-08-20 20:58:26 +01:00 |
Ed Addario
|
3f0118d602
|
Fix bias lambda bug
|
2025-08-20 17:26:37 +01:00 |
Ed Addario
|
52da4a4f8c
|
Skip if output.weight or type is COPY
|
2025-08-20 17:26:05 +01:00 |
Ed Addario
|
43caadf783
|
Add better fallbacks for IQ mixes
|
2025-08-20 17:24:48 +01:00 |
Ed Addario
|
29b2dc3ec0
|
Do not mix K and IQ quants
|
2025-08-20 13:27:01 +01:00 |
Ed Addario
|
5cd69a6809
|
Add F16/BF16 type
|
2025-08-20 09:41:39 +01:00 |
Ed Addario
|
936294f6af
|
Increase precision for error calculation
|
2025-08-19 23:31:22 +01:00 |
Ed Addario
|
f22b3097eb
|
Avoid division by zero if truncation occurs
|
2025-08-19 22:34:01 +01:00 |
Ed Addario
|
ee05d6bc0b
|
Update comments
|
2025-08-19 22:32:53 +01:00 |
Ed Addario
|
5aceb9e3ae
|
Refactor variable names
|
2025-08-19 22:29:27 +01:00 |
Ed Addario
|
1187f6aa9e
|
Implement bpw_overrides call
|
2025-08-19 11:07:03 +01:00 |
Ed Addario
|
92f49ab399
|
Add target_bpw_type() logic
|
2025-08-19 11:05:01 +01:00 |
Ed Addario
|
017945a3b2
|
Validate if imatrix contains activations
|
2025-08-19 11:03:52 +01:00 |
Ed Addario
|
9adae08789
|
Add is_iq()
|
2025-08-19 11:00:50 +01:00 |
Ed Addario
|
c96b8eef94
|
Add fallback_type enum
|
2025-08-19 11:00:05 +01:00 |
Ed Addario
|
a22a9deeee
|
Refactor variable and add target_bpw
|
2025-08-19 10:57:44 +01:00 |
Xuan-Son Nguyen
|
50aa938901
|
convert : support non-mxfp4 HF model (#15153)
* convert : support non-mxfp4 HF model
* rm redundant check
* disable debug check
|
2025-08-07 23:26:03 +02:00 |
Georgi Gerganov
|
fd1234cb46
|
llama : add gpt-oss (#15091)
* oai moe
* compat with new checkpoint
* add attn sink impl
* add rope scaling yarn
* logits match with latest transformers code
* wip chat template
* rm trailing space
* use ggml_scale_bias
* rm redundant is_swa_all
* convert interleaved gate_up
* graph : fix activation function to match reference (#7)
* vocab : handle o200k_harmony special tokens
* ggml : add attention sinks support (#1)
* llama : add attn sinks
* ggml : add attn sinks
* cuda : add attn sinks
* vulkan : add support for sinks in softmax
remove unnecessary return
* ggml : add fused swiglu_oai op (#11)
* ggml : add fused swiglu_oai op
* Update ggml/src/ggml-cpu/ops.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* update CUDA impl
* cont : metal impl
* add vulkan impl
* test-backend-ops : more test cases, clean up
* llama : remove unfused impl
* remove extra lines
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
* repack mxfp4 upon conversion
* clean up a bit
* enable thinking
* add quick hack to render only some special tokens
* fix bf16 conversion
* remove vocab hack
* webui ok
* support chat parsing for gpt-oss
* fix webui
* direct mapping mxfp4, FINALLY
* force using mxfp4
* properly use lazy tensor
* ggml : add mxfp4
ggml : use e8m0 conversion instead of powf
Co-authored-by: Diego Devesa <slarengh@gmail.com>
change kvalues_mxfp4 table to match e2m1 (#6)
metal : remove quantization for now (not used)
cuda : fix disabled CUDA graphs due to ffn moe bias
vulkan : add support for mxfp4
cont : add cm2 dequant
* ggml : add ggml_add_id (#13)
* ggml : add ggml_add_id
* add cuda impl
* llama : add weight support check for add_id
* perf opt
* add vulkan impl
* rename cuda files
* add metal impl
* allow in-place ggml_add_id
* llama : keep biases on CPU with --cpu-moe
* llama : fix compile error
ggml-ci
* cuda : add fallback for __nv_cvt_e8m0_to_bf16raw
ggml-ci
* cleanup
ggml-ci
* sycl : fix supports_op for MXFP4
ggml-ci
* fix Unknown reasoning format
* ggml-cpu : fix AVX build
ggml-ci
* fix hip build
ggml-ci
* cuda : add mxfp4 dequantization support for cuBLAS
ggml-ci
* ggml-cpu : fix mxfp4 fallback definitions for some architectures
ggml-ci
* cuda : fix version required for __nv_cvt_e8m0_to_bf16raw
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: slaren <slarengh@gmail.com>
|
2025-08-05 22:10:36 +03:00 |
Ed Addario
|
daf2dd7880
|
quantize : skip tensor override when in fallback mode (#14995)
|
2025-07-31 21:32:18 +02:00 |
Ed Addario
|
982e347255
|
quantize : fix minor logic flaw in --tensor-type (#14572)
|
2025-07-13 18:02:17 +02:00 |
Tarek Dakhran
|
f5e96b368f
|
model : support LiquidAI LFM2 hybrid family (#14620)
**Important**
LFM2 was [merged ](https://github.com/huggingface/transformers/pull/39340)into transformers, but has not yet been released.
To convert into gguf, install transformers from source
```shell
pip install "transformers @ git+https://github.com/huggingface/transformers.git@main"
```
|
2025-07-11 20:27:01 +02:00 |
Xuan-Son Nguyen
|
8846aace49
|
model : gemma3n text-only (#14400)
* gemma3n
* add llm_graph_input_one
|
2025-06-26 20:34:02 +03:00 |
Ed Addario
|
fa4a9f2a1c
|
quantize : handle user-defined pruning of whole layers (blocks) (#13037)
|
2025-06-22 23:16:26 +02:00 |
Ed Addario
|
30e5b01de2
|
quantize : change int to unsigned int for KV overrides (#14197)
|
2025-06-15 18:53:45 +02:00 |
Ed Addario
|
e5c834f718
|
quantize : improve tensor-type pattern matching (#13033)
|
2025-05-13 19:12:31 +02:00 |