llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	6729dedbb5	Merge branch 'master' into quantize	2026-02-20 16:47:26 +00:00
Ed Addario	f2a719b14a	Change tensor importance score logic	2026-02-20 15:05:46 +00:00
ddh0	492bc31978	quantize : add --dry-run option (#19526 ) * clean slate for branch * use 6 characters for tensor dims * add --dry-run to llama-quantize * use 6 characters for tensor dims (cont.) * no need to re-calculate ggml_nbytes for tensor * fix indent * show model and quant BPW when quant completes * add example to --help * new function `tensor_requires_imatrix`, add courtesy warning about imatrix * missing __func__, move imatrix flag set * logic error * fixup tensor_requires_imatrix * add missing `GGML_TYPE`s * simplify and rename `tensor_type_requires_imatrix` * simplify for style * add back Q2_K edge case for imatrix * guard ftype imatrix warning * comment ref #12557 * remove per @compilade * remove unused `params` parameter * move `bool dry_run` per GG * move `bool dry_run` per GG * Update src/llama-quant.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-quant.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-quant.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-20 09:20:16 +01:00
Ed Addario	fb2af3353d	Fix bug	2026-02-14 17:31:24 +00:00
ddh0	5999b50eb0	llama-quantize : cleanup `--help` output (#19317 ) * cleanup `llama-quantize --help` output some much needed TLC * remove future argument oops, spoiler * cleanup of cleanup	2026-02-08 09:22:38 +02:00
Ed Addario	462d3dab82	Merge branch 'master' into quantize	2026-02-03 10:57:05 +00:00
EugeoSynthesisThirtyTwo	3dd95914d0	quantize: add option --tensor-type-file to llama-quantize (#18572 ) * add option --tensor-type-file to llama-quantize, but it raises an error. * add error message when file not found * quantize: update help menu, fix CI Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>	2026-01-31 11:39:21 +08:00
Ed Addario	3ba6798d45	Read statistics_data from imatrix	2026-01-21 18:27:44 +00:00
Ed Addario	05d07d8c4a	Update README.md	2026-01-07 18:32:32 +00:00
Ed Addario	26213bc805	Update usage()	2026-01-07 18:32:01 +00:00
Ed Addario	e209fb57a9	Refactor option names	2026-01-07 18:25:33 +00:00
Ed Addario	93c77f7dac	Update usage()	2026-01-07 18:12:15 +00:00
Ed Addario	097bdb34de	Add --target-size option	2026-01-07 18:10:27 +00:00
Ed Addario	0fdbe5495d	Add parse_target_size()	2026-01-07 18:08:35 +00:00
Ed Addario	efe9c8b933	Merge branch 'master' into quantize	2026-01-01 13:48:02 +00:00
Anri Lombard	33ded988ba	quantize: prevent input/output file collision (#18451 ) Check if input and output files are the same before quantizing to prevent file corruption when mmap reads from a file being written to. Fixes #12753	2025-12-31 23:29:03 +08:00
Ed Addario	7f88612861	Update README.md	2025-12-25 17:47:38 +00:00
Ed Addario	311c2c9f0e	Update README.md	2025-12-25 17:45:05 +00:00
Ed Addario	3be3b1ef87	Update usage()	2025-12-25 17:44:43 +00:00
Ed Addario	dfa79a9484	Merge branch 'master' into quantize	2025-12-16 13:57:54 +01:00
Xuan-Son Nguyen	6c2131773c	cli: new CLI experience (#17824 ) * wip * wip * fix logging, add display info * handle commands * add args * wip * move old cli to llama-completion * rm deprecation notice * move server to a shared library * move ci to llama-completion * add loading animation * add --show-timings arg * add /read command, improve LOG_ERR * add args for speculative decoding, enable show timings by default * add arg --image and --audio * fix windows build * support reasoning_content * fix llama2c workflow * color default is auto * fix merge conflicts * properly fix color problem Co-authored-by: bandoti <bandoti@users.noreply.github.com> * better loading spinner * make sure to clean color on force-exit * also clear input files on "/clear" * simplify common_log_flush * add warning in mtmd-cli * implement console writter * fix data race * add attribute * fix llama-completion and mtmd-cli * add some notes about console::log * fix compilation --------- Co-authored-by: bandoti <bandoti@users.noreply.github.com>	2025-12-10 15:28:59 +01:00
Ed Addario	b97cda6289	Add B/F16 to get_ftype()	2025-11-29 23:52:51 +00:00
Ed Addario	69a32b6f50	Relax target bpw range	2025-11-29 10:28:43 +00:00
Ed Addario	6616008420	Use more descriptive option naming	2025-11-24 18:26:45 +00:00
Ed Addario	1c9993e131	Add --disable-tensor-importance option	2025-11-23 17:51:04 +00:00
Ed Addario	9ec3e6e262	Remove processing statistics_data	2025-11-23 17:49:53 +00:00
Ed Addario	6e32244a06	Read statistics from imatrix	2025-10-30 21:53:07 +00:00
Ed Addario	00ddf039b3	Update usage	2025-10-20 21:38:49 +01:00
Ed Addario	0b3e930d52	Add option to override bpw state file name	2025-10-16 11:41:26 +01:00
Ed Addario	cd734b89ce	Update quant types	2025-10-13 15:15:23 +01:00
Ed Addario	ca282302b5	Add --keep-bpw-state option	2025-10-12 18:23:23 +01:00
Ed Addario	c93131cef6	Remove --no-bias option	2025-10-10 13:26:51 +01:00
Ed Addario	66d4aed173	Minor refactoring	2025-10-04 08:21:01 +01:00
Ed Addario	940db63144	Select quantization type if target_bpw is set unless user specifies type and threads	2025-10-03 11:08:02 +01:00
Ed Addario	dd4f4bd0b8	Reduce bpw range	2025-09-27 17:23:48 +01:00
Ed Addario	29bb30c4ed	Merge branch 'master' into quantize	2025-09-25 19:55:31 +01:00
Georgi Gerganov	1d660d2fae	ci : use smaller model (#16168 ) * ci : switch from gemma to qwen3 0.6b * ci : use smaller model for some tests	2025-09-22 09:11:39 +03:00
Ed Addario	9e74f83411	Replace --bpw-bias flag with --no-bias	2025-09-20 23:06:37 +01:00
Ed Addario	ab02bb1f3e	Merge branch 'master' into quantize	2025-09-20 21:41:25 +01:00
Yuri Khrustalev	07808ebb07	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
Ed Addario	04c07b3272	Add better control over MSE and directional bias computation	2025-09-10 18:00:56 +01:00
Ed Addario	556f6b04fe	Add --precise-lambda option	2025-08-28 16:08:08 +01:00
Ed Addario	d4ac2106fb	Improve logging and some minor code refactoring	2025-08-24 13:39:10 +01:00
Ed Addario	69586e212e	Add F16/BF16 type	2025-08-20 13:23:11 +01:00
Ed Addario	1b3d5b5744	Populate params	2025-08-19 10:56:02 +01:00
Ed Addario	e877474458	Process target_bpw parameter	2025-08-19 10:54:02 +01:00
Ed Addario	0edbf0c176	Process activations	2025-08-19 10:51:58 +01:00
Ed Addario	77b818c040	Populate activations_data with imatrix activations if present	2025-08-19 10:50:37 +01:00
Ed Addario	e6d55dc47b	Load activations	2025-08-19 10:49:01 +01:00
Ed Addario	5e85fb3ff3	Add parse_target_bpw()	2025-08-19 10:46:36 +01:00

1 2

60 Commits