Commit Graph

54 Commits

Author SHA1 Message Date
Ed Addario 9bb8e17e04
Remove wce flag 2026-03-12 19:55:56 +00:00
Ed Addario 0ccf5e5f21
Test removing unused headers 2026-03-12 16:04:36 +00:00
Ed Addario fd64e639ab
Merge branch 'master' into quantize 2026-03-12 15:43:01 +00:00
ddh0 1dab5f5a44
llama-quant : fail early on missing imatrix, refactor type selection, code cleanup (#19770)
* quantize : imatrix-fail early + code cleanup

* fix manual override printing

it's in the preliminary loop now, so needs to be on its own line

* revert header changes per ggerganov

* remove old #includes

* clarify naming

rename `tensor_quantization` to `tensor_typo_option` to descirbe its
functionality

* fix per barto
2026-03-10 08:16:05 +02:00
SamareshSingh cb8f4fa3f8
Fix locale-dependent float printing in GGUF metadata (#17331)
* Set C locale for consistent float formatting across all binaries.

* Add C locale setting to all tools binaries

Add std::setlocale(LC_NUMERIC, "C") to all 16 binaries in the tools/
directory to ensure consistent floating-point formatting.

* Apply suggestion from @JohannesGaessler

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-04 09:30:40 +01:00
Ed Addario 6729dedbb5
Merge branch 'master' into quantize 2026-02-20 16:47:26 +00:00
Ed Addario f2a719b14a
Change tensor importance score logic 2026-02-20 15:05:46 +00:00
ddh0 492bc31978
quantize : add --dry-run option (#19526)
* clean slate for branch

* use 6 characters for tensor dims

* add --dry-run to llama-quantize

* use 6 characters for tensor dims (cont.)

* no need to re-calculate ggml_nbytes for tensor

* fix indent

* show model and quant BPW when quant completes

* add example to --help

* new function `tensor_requires_imatrix`, add courtesy warning about imatrix

* missing __func__, move imatrix flag set

* logic error

* fixup tensor_requires_imatrix

* add missing `GGML_TYPE`s

* simplify and rename `tensor_type_requires_imatrix`

* simplify for style

* add back Q2_K edge case for imatrix

* guard ftype imatrix warning

* comment ref #12557

* remove per @compilade

* remove unused `params` parameter

* move `bool dry_run` per GG

* move `bool dry_run` per GG

* Update src/llama-quant.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-quant.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-quant.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-20 09:20:16 +01:00
Ed Addario fb2af3353d
Fix bug 2026-02-14 17:31:24 +00:00
ddh0 5999b50eb0
llama-quantize : cleanup `--help` output (#19317)
* cleanup `llama-quantize --help` output

some much needed TLC

* remove future argument

oops, spoiler

* cleanup of cleanup
2026-02-08 09:22:38 +02:00
Ed Addario 462d3dab82
Merge branch 'master' into quantize 2026-02-03 10:57:05 +00:00
EugeoSynthesisThirtyTwo 3dd95914d0
quantize: add option --tensor-type-file to llama-quantize (#18572)
* add option --tensor-type-file to llama-quantize, but it raises an error.

* add error message when file not found

* quantize: update help menu, fix CI

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

---------

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
2026-01-31 11:39:21 +08:00
Ed Addario 3ba6798d45
Read statistics_data from imatrix 2026-01-21 18:27:44 +00:00
Ed Addario 26213bc805
Update usage() 2026-01-07 18:32:01 +00:00
Ed Addario e209fb57a9
Refactor option names 2026-01-07 18:25:33 +00:00
Ed Addario 93c77f7dac
Update usage() 2026-01-07 18:12:15 +00:00
Ed Addario 097bdb34de
Add --target-size option 2026-01-07 18:10:27 +00:00
Ed Addario 0fdbe5495d
Add parse_target_size() 2026-01-07 18:08:35 +00:00
Ed Addario efe9c8b933
Merge branch 'master' into quantize 2026-01-01 13:48:02 +00:00
Anri Lombard 33ded988ba
quantize: prevent input/output file collision (#18451)
Check if input and output files are the same before quantizing to prevent
file corruption when mmap reads from a file being written to.

Fixes #12753
2025-12-31 23:29:03 +08:00
Ed Addario 3be3b1ef87
Update usage() 2025-12-25 17:44:43 +00:00
Ed Addario b97cda6289
Add B/F16 to get_ftype() 2025-11-29 23:52:51 +00:00
Ed Addario 69a32b6f50
Relax target bpw range 2025-11-29 10:28:43 +00:00
Ed Addario 6616008420
Use more descriptive option naming 2025-11-24 18:26:45 +00:00
Ed Addario 1c9993e131
Add --disable-tensor-importance option 2025-11-23 17:51:04 +00:00
Ed Addario 9ec3e6e262
Remove processing statistics_data 2025-11-23 17:49:53 +00:00
Ed Addario 6e32244a06
Read statistics from imatrix 2025-10-30 21:53:07 +00:00
Ed Addario 00ddf039b3
Update usage 2025-10-20 21:38:49 +01:00
Ed Addario 0b3e930d52
Add option to override bpw state file name 2025-10-16 11:41:26 +01:00
Ed Addario cd734b89ce
Update quant types 2025-10-13 15:15:23 +01:00
Ed Addario ca282302b5
Add --keep-bpw-state option 2025-10-12 18:23:23 +01:00
Ed Addario c93131cef6
Remove --no-bias option 2025-10-10 13:26:51 +01:00
Ed Addario 66d4aed173
Minor refactoring 2025-10-04 08:21:01 +01:00
Ed Addario 940db63144
Select quantization type if target_bpw is set unless user specifies type and threads 2025-10-03 11:08:02 +01:00
Ed Addario dd4f4bd0b8
Reduce bpw range 2025-09-27 17:23:48 +01:00
Ed Addario 9e74f83411
Replace --bpw-bias flag with --no-bias 2025-09-20 23:06:37 +01:00
Ed Addario 04c07b3272
Add better control over MSE and directional bias computation 2025-09-10 18:00:56 +01:00
Ed Addario 556f6b04fe
Add --precise-lambda option 2025-08-28 16:08:08 +01:00
Ed Addario d4ac2106fb
Improve logging and some minor code refactoring 2025-08-24 13:39:10 +01:00
Ed Addario 69586e212e
Add F16/BF16 type 2025-08-20 13:23:11 +01:00
Ed Addario 1b3d5b5744
Populate params 2025-08-19 10:56:02 +01:00
Ed Addario e877474458
Process target_bpw parameter 2025-08-19 10:54:02 +01:00
Ed Addario 0edbf0c176
Process activations 2025-08-19 10:51:58 +01:00
Ed Addario 77b818c040
Populate activations_data with imatrix activations if present 2025-08-19 10:50:37 +01:00
Ed Addario e6d55dc47b
Load activations 2025-08-19 10:49:01 +01:00
Ed Addario 5e85fb3ff3
Add parse_target_bpw() 2025-08-19 10:46:36 +01:00
Ed Addario cfec4048ab
Update usage 2025-08-19 10:43:51 +01:00
Georgi Gerganov fd1234cb46
llama : add gpt-oss (#15091)
* oai moe

* compat with new checkpoint

* add attn sink impl

* add rope scaling yarn

* logits match with latest transformers code

* wip chat template

* rm trailing space

* use ggml_scale_bias

* rm redundant is_swa_all

* convert interleaved gate_up

* graph : fix activation function to match reference (#7)

* vocab : handle o200k_harmony special tokens

* ggml : add attention sinks support (#1)

* llama : add attn sinks

* ggml : add attn sinks

* cuda : add attn sinks

* vulkan : add support for sinks in softmax

remove unnecessary return

* ggml : add fused swiglu_oai op (#11)

* ggml : add fused swiglu_oai op

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* update CUDA impl

* cont : metal impl

* add vulkan impl

* test-backend-ops : more test cases, clean up

* llama : remove unfused impl

* remove extra lines

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>

* repack mxfp4 upon conversion

* clean up a bit

* enable thinking

* add quick hack to render only some special tokens

* fix bf16 conversion

* remove vocab hack

* webui ok

* support chat parsing for gpt-oss

* fix webui

* direct mapping mxfp4, FINALLY

* force using mxfp4

* properly use lazy tensor

* ggml : add mxfp4

ggml : use e8m0 conversion instead of powf

Co-authored-by: Diego Devesa <slarengh@gmail.com>

change kvalues_mxfp4 table to match e2m1 (#6)

metal : remove quantization for now (not used)

cuda : fix disabled CUDA graphs due to ffn moe bias

vulkan : add support for mxfp4

cont : add cm2 dequant

* ggml : add ggml_add_id (#13)

* ggml : add ggml_add_id

* add cuda impl

* llama : add weight support check for add_id

* perf opt

* add vulkan impl

* rename cuda files

* add metal impl

* allow in-place ggml_add_id

* llama : keep biases on CPU with --cpu-moe

* llama : fix compile error

ggml-ci

* cuda : add fallback for __nv_cvt_e8m0_to_bf16raw

ggml-ci

* cleanup

ggml-ci

* sycl : fix supports_op for MXFP4

ggml-ci

* fix Unknown reasoning format

* ggml-cpu : fix AVX build

ggml-ci

* fix hip build

ggml-ci

* cuda : add mxfp4 dequantization support for cuBLAS

ggml-ci

* ggml-cpu : fix mxfp4 fallback definitions for some architectures

ggml-ci

* cuda : fix version required for __nv_cvt_e8m0_to_bf16raw

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: slaren <slarengh@gmail.com>
2025-08-05 22:10:36 +03:00
Sigbjørn Skjæret 2721257e3e
quantize : fix confusing error message if ftype is invalid (#15071) 2025-08-04 18:11:02 +02:00
Ed Addario e9192bec56
quantize : fix using combined imatrix GGUFs (multiple datasets) (#14973) 2025-07-30 21:11:56 +02:00