llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	fb2af3353d	Fix bug	2026-02-14 17:31:24 +00:00
Ed Addario	462d3dab82	Merge branch 'master' into quantize	2026-02-03 10:57:05 +00:00
EugeoSynthesisThirtyTwo	3dd95914d0	quantize: add option --tensor-type-file to llama-quantize (#18572 ) * add option --tensor-type-file to llama-quantize, but it raises an error. * add error message when file not found * quantize: update help menu, fix CI Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>	2026-01-31 11:39:21 +08:00
Ed Addario	3ba6798d45	Read statistics_data from imatrix	2026-01-21 18:27:44 +00:00
Ed Addario	05d07d8c4a	Update README.md	2026-01-07 18:32:32 +00:00
Ed Addario	26213bc805	Update usage()	2026-01-07 18:32:01 +00:00
Ed Addario	e209fb57a9	Refactor option names	2026-01-07 18:25:33 +00:00
Ed Addario	93c77f7dac	Update usage()	2026-01-07 18:12:15 +00:00
Ed Addario	097bdb34de	Add --target-size option	2026-01-07 18:10:27 +00:00
Ed Addario	0fdbe5495d	Add parse_target_size()	2026-01-07 18:08:35 +00:00
Ed Addario	efe9c8b933	Merge branch 'master' into quantize	2026-01-01 13:48:02 +00:00
Anri Lombard	33ded988ba	quantize: prevent input/output file collision (#18451 ) Check if input and output files are the same before quantizing to prevent file corruption when mmap reads from a file being written to. Fixes #12753	2025-12-31 23:29:03 +08:00
Ed Addario	7f88612861	Update README.md	2025-12-25 17:47:38 +00:00
Ed Addario	311c2c9f0e	Update README.md	2025-12-25 17:45:05 +00:00
Ed Addario	3be3b1ef87	Update usage()	2025-12-25 17:44:43 +00:00
Ed Addario	dfa79a9484	Merge branch 'master' into quantize	2025-12-16 13:57:54 +01:00
Xuan-Son Nguyen	6c2131773c	cli: new CLI experience (#17824 ) * wip * wip * fix logging, add display info * handle commands * add args * wip * move old cli to llama-completion * rm deprecation notice * move server to a shared library * move ci to llama-completion * add loading animation * add --show-timings arg * add /read command, improve LOG_ERR * add args for speculative decoding, enable show timings by default * add arg --image and --audio * fix windows build * support reasoning_content * fix llama2c workflow * color default is auto * fix merge conflicts * properly fix color problem Co-authored-by: bandoti <bandoti@users.noreply.github.com> * better loading spinner * make sure to clean color on force-exit * also clear input files on "/clear" * simplify common_log_flush * add warning in mtmd-cli * implement console writter * fix data race * add attribute * fix llama-completion and mtmd-cli * add some notes about console::log * fix compilation --------- Co-authored-by: bandoti <bandoti@users.noreply.github.com>	2025-12-10 15:28:59 +01:00
Ed Addario	b97cda6289	Add B/F16 to get_ftype()	2025-11-29 23:52:51 +00:00
Ed Addario	69a32b6f50	Relax target bpw range	2025-11-29 10:28:43 +00:00
Ed Addario	6616008420	Use more descriptive option naming	2025-11-24 18:26:45 +00:00
Ed Addario	1c9993e131	Add --disable-tensor-importance option	2025-11-23 17:51:04 +00:00
Ed Addario	9ec3e6e262	Remove processing statistics_data	2025-11-23 17:49:53 +00:00
Ed Addario	6e32244a06	Read statistics from imatrix	2025-10-30 21:53:07 +00:00
Ed Addario	00ddf039b3	Update usage	2025-10-20 21:38:49 +01:00
Ed Addario	0b3e930d52	Add option to override bpw state file name	2025-10-16 11:41:26 +01:00
Ed Addario	cd734b89ce	Update quant types	2025-10-13 15:15:23 +01:00
Ed Addario	ca282302b5	Add --keep-bpw-state option	2025-10-12 18:23:23 +01:00
Ed Addario	c93131cef6	Remove --no-bias option	2025-10-10 13:26:51 +01:00
Ed Addario	66d4aed173	Minor refactoring	2025-10-04 08:21:01 +01:00
Ed Addario	940db63144	Select quantization type if target_bpw is set unless user specifies type and threads	2025-10-03 11:08:02 +01:00
Ed Addario	dd4f4bd0b8	Reduce bpw range	2025-09-27 17:23:48 +01:00
Ed Addario	29bb30c4ed	Merge branch 'master' into quantize	2025-09-25 19:55:31 +01:00
Georgi Gerganov	1d660d2fae	ci : use smaller model (#16168 ) * ci : switch from gemma to qwen3 0.6b * ci : use smaller model for some tests	2025-09-22 09:11:39 +03:00
Ed Addario	9e74f83411	Replace --bpw-bias flag with --no-bias	2025-09-20 23:06:37 +01:00
Ed Addario	ab02bb1f3e	Merge branch 'master' into quantize	2025-09-20 21:41:25 +01:00
Yuri Khrustalev	07808ebb07	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
Ed Addario	04c07b3272	Add better control over MSE and directional bias computation	2025-09-10 18:00:56 +01:00
Ed Addario	556f6b04fe	Add --precise-lambda option	2025-08-28 16:08:08 +01:00
Ed Addario	d4ac2106fb	Improve logging and some minor code refactoring	2025-08-24 13:39:10 +01:00
Ed Addario	69586e212e	Add F16/BF16 type	2025-08-20 13:23:11 +01:00
Ed Addario	1b3d5b5744	Populate params	2025-08-19 10:56:02 +01:00
Ed Addario	e877474458	Process target_bpw parameter	2025-08-19 10:54:02 +01:00
Ed Addario	0edbf0c176	Process activations	2025-08-19 10:51:58 +01:00
Ed Addario	77b818c040	Populate activations_data with imatrix activations if present	2025-08-19 10:50:37 +01:00
Ed Addario	e6d55dc47b	Load activations	2025-08-19 10:49:01 +01:00
Ed Addario	5e85fb3ff3	Add parse_target_bpw()	2025-08-19 10:46:36 +01:00
Ed Addario	cfec4048ab	Update usage	2025-08-19 10:43:51 +01:00
Georgi Gerganov	fd1234cb46	llama : add gpt-oss (#15091 ) * oai moe * compat with new checkpoint * add attn sink impl * add rope scaling yarn * logits match with latest transformers code * wip chat template * rm trailing space * use ggml_scale_bias * rm redundant is_swa_all * convert interleaved gate_up * graph : fix activation function to match reference (#7) * vocab : handle o200k_harmony special tokens * ggml : add attention sinks support (#1) * llama : add attn sinks * ggml : add attn sinks * cuda : add attn sinks * vulkan : add support for sinks in softmax remove unnecessary return * ggml : add fused swiglu_oai op (#11) * ggml : add fused swiglu_oai op * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update CUDA impl * cont : metal impl * add vulkan impl * test-backend-ops : more test cases, clean up * llama : remove unfused impl * remove extra lines --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com> * repack mxfp4 upon conversion * clean up a bit * enable thinking * add quick hack to render only some special tokens * fix bf16 conversion * remove vocab hack * webui ok * support chat parsing for gpt-oss * fix webui * direct mapping mxfp4, FINALLY * force using mxfp4 * properly use lazy tensor * ggml : add mxfp4 ggml : use e8m0 conversion instead of powf Co-authored-by: Diego Devesa <slarengh@gmail.com> change kvalues_mxfp4 table to match e2m1 (#6) metal : remove quantization for now (not used) cuda : fix disabled CUDA graphs due to ffn moe bias vulkan : add support for mxfp4 cont : add cm2 dequant * ggml : add ggml_add_id (#13) * ggml : add ggml_add_id * add cuda impl * llama : add weight support check for add_id * perf opt * add vulkan impl * rename cuda files * add metal impl * allow in-place ggml_add_id * llama : keep biases on CPU with --cpu-moe * llama : fix compile error ggml-ci * cuda : add fallback for __nv_cvt_e8m0_to_bf16raw ggml-ci * cleanup ggml-ci * sycl : fix supports_op for MXFP4 ggml-ci * fix Unknown reasoning format * ggml-cpu : fix AVX build ggml-ci * fix hip build ggml-ci * cuda : add mxfp4 dequantization support for cuBLAS ggml-ci * ggml-cpu : fix mxfp4 fallback definitions for some architectures ggml-ci * cuda : fix version required for __nv_cvt_e8m0_to_bf16raw --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: slaren <slarengh@gmail.com>	2025-08-05 22:10:36 +03:00
Sigbjørn Skjæret	2721257e3e	quantize : fix confusing error message if ftype is invalid (#15071 )	2025-08-04 18:11:02 +02:00
Ed Addario	e9192bec56	quantize : fix using combined imatrix GGUFs (multiple datasets) (#14973 )	2025-07-30 21:11:56 +02:00

1 2

56 Commits