Ed Addario
0ccf5e5f21
Test removing unused headers
2026-03-12 16:04:36 +00:00
Ed Addario
fd64e639ab
Merge branch 'master' into quantize
2026-03-12 15:43:01 +00:00
ddh0
1dab5f5a44
llama-quant : fail early on missing imatrix, refactor type selection, code cleanup ( #19770 )
...
* quantize : imatrix-fail early + code cleanup
* fix manual override printing
it's in the preliminary loop now, so needs to be on its own line
* revert header changes per ggerganov
* remove old #includes
* clarify naming
rename `tensor_quantization` to `tensor_typo_option` to descirbe its
functionality
* fix per barto
2026-03-10 08:16:05 +02:00
Marcel Petrick
92f7da00b4
chore : correct typos [no ci] ( #20041 )
...
* fix(docs): correct typos found during code review
Non-functional changes only:
- Fixed minor spelling mistakes in comments
- Corrected typos in user-facing strings
- No variables, logic, or functional code was modified.
Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
* Update docs/backend/CANN.md
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
* Revert "Auxiliary commit to revert individual files from 846d1c301281178efbc6ce6060ad34c1ebe45af8"
This reverts commit 02fcf0c7db661d5ff3eff96b2b2db9fdb7213256.
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update tests/test-backend-ops.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Signed-off-by: Marcel Petrick <mail@marcelpetrick.it>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-03-05 08:50:21 +01:00
SamareshSingh
cb8f4fa3f8
Fix locale-dependent float printing in GGUF metadata ( #17331 )
...
* Set C locale for consistent float formatting across all binaries.
* Add C locale setting to all tools binaries
Add std::setlocale(LC_NUMERIC, "C") to all 16 binaries in the tools/
directory to ensure consistent floating-point formatting.
* Apply suggestion from @JohannesGaessler
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-04 09:30:40 +01:00
Ed Addario
6729dedbb5
Merge branch 'master' into quantize
2026-02-20 16:47:26 +00:00
Ed Addario
f2a719b14a
Change tensor importance score logic
2026-02-20 15:05:46 +00:00
ddh0
492bc31978
quantize : add --dry-run option ( #19526 )
...
* clean slate for branch
* use 6 characters for tensor dims
* add --dry-run to llama-quantize
* use 6 characters for tensor dims (cont.)
* no need to re-calculate ggml_nbytes for tensor
* fix indent
* show model and quant BPW when quant completes
* add example to --help
* new function `tensor_requires_imatrix`, add courtesy warning about imatrix
* missing __func__, move imatrix flag set
* logic error
* fixup tensor_requires_imatrix
* add missing `GGML_TYPE`s
* simplify and rename `tensor_type_requires_imatrix`
* simplify for style
* add back Q2_K edge case for imatrix
* guard ftype imatrix warning
* comment ref #12557
* remove per @compilade
* remove unused `params` parameter
* move `bool dry_run` per GG
* move `bool dry_run` per GG
* Update src/llama-quant.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-quant.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-quant.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-20 09:20:16 +01:00
Ed Addario
fb2af3353d
Fix bug
2026-02-14 17:31:24 +00:00
ddh0
5999b50eb0
llama-quantize : cleanup `--help` output ( #19317 )
...
* cleanup `llama-quantize --help` output
some much needed TLC
* remove future argument
oops, spoiler
* cleanup of cleanup
2026-02-08 09:22:38 +02:00
Ed Addario
462d3dab82
Merge branch 'master' into quantize
2026-02-03 10:57:05 +00:00
EugeoSynthesisThirtyTwo
3dd95914d0
quantize: add option --tensor-type-file to llama-quantize ( #18572 )
...
* add option --tensor-type-file to llama-quantize, but it raises an error.
* add error message when file not found
* quantize: update help menu, fix CI
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
2026-01-31 11:39:21 +08:00
Ed Addario
3ba6798d45
Read statistics_data from imatrix
2026-01-21 18:27:44 +00:00
Ed Addario
05d07d8c4a
Update README.md
2026-01-07 18:32:32 +00:00
Ed Addario
26213bc805
Update usage()
2026-01-07 18:32:01 +00:00
Ed Addario
e209fb57a9
Refactor option names
2026-01-07 18:25:33 +00:00
Ed Addario
93c77f7dac
Update usage()
2026-01-07 18:12:15 +00:00
Ed Addario
097bdb34de
Add --target-size option
2026-01-07 18:10:27 +00:00
Ed Addario
0fdbe5495d
Add parse_target_size()
2026-01-07 18:08:35 +00:00
Ed Addario
efe9c8b933
Merge branch 'master' into quantize
2026-01-01 13:48:02 +00:00
Anri Lombard
33ded988ba
quantize: prevent input/output file collision ( #18451 )
...
Check if input and output files are the same before quantizing to prevent
file corruption when mmap reads from a file being written to.
Fixes #12753
2025-12-31 23:29:03 +08:00
Ed Addario
7f88612861
Update README.md
2025-12-25 17:47:38 +00:00
Ed Addario
311c2c9f0e
Update README.md
2025-12-25 17:45:05 +00:00
Ed Addario
3be3b1ef87
Update usage()
2025-12-25 17:44:43 +00:00
Ed Addario
dfa79a9484
Merge branch 'master' into quantize
2025-12-16 13:57:54 +01:00
Xuan-Son Nguyen
6c2131773c
cli: new CLI experience ( #17824 )
...
* wip
* wip
* fix logging, add display info
* handle commands
* add args
* wip
* move old cli to llama-completion
* rm deprecation notice
* move server to a shared library
* move ci to llama-completion
* add loading animation
* add --show-timings arg
* add /read command, improve LOG_ERR
* add args for speculative decoding, enable show timings by default
* add arg --image and --audio
* fix windows build
* support reasoning_content
* fix llama2c workflow
* color default is auto
* fix merge conflicts
* properly fix color problem
Co-authored-by: bandoti <bandoti@users.noreply.github.com>
* better loading spinner
* make sure to clean color on force-exit
* also clear input files on "/clear"
* simplify common_log_flush
* add warning in mtmd-cli
* implement console writter
* fix data race
* add attribute
* fix llama-completion and mtmd-cli
* add some notes about console::log
* fix compilation
---------
Co-authored-by: bandoti <bandoti@users.noreply.github.com>
2025-12-10 15:28:59 +01:00
Ed Addario
b97cda6289
Add B/F16 to get_ftype()
2025-11-29 23:52:51 +00:00
Ed Addario
69a32b6f50
Relax target bpw range
2025-11-29 10:28:43 +00:00
Ed Addario
6616008420
Use more descriptive option naming
2025-11-24 18:26:45 +00:00
Ed Addario
1c9993e131
Add --disable-tensor-importance option
2025-11-23 17:51:04 +00:00
Ed Addario
9ec3e6e262
Remove processing statistics_data
2025-11-23 17:49:53 +00:00
Ed Addario
6e32244a06
Read statistics from imatrix
2025-10-30 21:53:07 +00:00
Ed Addario
00ddf039b3
Update usage
2025-10-20 21:38:49 +01:00
Ed Addario
0b3e930d52
Add option to override bpw state file name
2025-10-16 11:41:26 +01:00
Ed Addario
cd734b89ce
Update quant types
2025-10-13 15:15:23 +01:00
Ed Addario
ca282302b5
Add --keep-bpw-state option
2025-10-12 18:23:23 +01:00
Ed Addario
c93131cef6
Remove --no-bias option
2025-10-10 13:26:51 +01:00
Ed Addario
66d4aed173
Minor refactoring
2025-10-04 08:21:01 +01:00
Ed Addario
940db63144
Select quantization type if target_bpw is set unless user specifies type and threads
2025-10-03 11:08:02 +01:00
Ed Addario
dd4f4bd0b8
Reduce bpw range
2025-09-27 17:23:48 +01:00
Ed Addario
29bb30c4ed
Merge branch 'master' into quantize
2025-09-25 19:55:31 +01:00
Georgi Gerganov
1d660d2fae
ci : use smaller model ( #16168 )
...
* ci : switch from gemma to qwen3 0.6b
* ci : use smaller model for some tests
2025-09-22 09:11:39 +03:00
Ed Addario
9e74f83411
Replace --bpw-bias flag with --no-bias
2025-09-20 23:06:37 +01:00
Ed Addario
ab02bb1f3e
Merge branch 'master' into quantize
2025-09-20 21:41:25 +01:00
Yuri Khrustalev
07808ebb07
cmake : Do not install tools on iOS targets ( #15903 )
2025-09-16 09:54:44 +07:00
Ed Addario
04c07b3272
Add better control over MSE and directional bias computation
2025-09-10 18:00:56 +01:00
Ed Addario
556f6b04fe
Add --precise-lambda option
2025-08-28 16:08:08 +01:00
Ed Addario
d4ac2106fb
Improve logging and some minor code refactoring
2025-08-24 13:39:10 +01:00
Ed Addario
69586e212e
Add F16/BF16 type
2025-08-20 13:23:11 +01:00
Ed Addario
1b3d5b5744
Populate params
2025-08-19 10:56:02 +01:00