llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	a74b410f5f	Move is_iq() into a lambda and remove unused variables	2025-09-25 19:49:47 +01:00
Ed Addario	8eedcf74bc	Increase scale multiplier	2025-09-22 20:42:37 +01:00
Ed Addario	d36ee0a0a8	Add comments to explain magic numbers	2025-09-22 20:41:56 +01:00
Ed Addario	7ba6001ec8	Simplify candidates sorting	2025-09-22 20:11:54 +01:00
Ed Addario	d79ade2e8e	Adjust for small vector size	2025-09-22 20:11:26 +01:00
Ed Addario	f184450806	Fix minor logic flaw	2025-09-22 20:10:42 +01:00
Ed Addario	1fbc59f867	Replace slope with cross product	2025-09-22 20:10:10 +01:00
Ed Addario	c855094dff	Exit loop if no better solution found	2025-09-22 20:09:11 +01:00
Ed Addario	b748a1efa7	Fix typo	2025-09-21 22:03:54 +01:00
Ed Addario	896cdc2121	Refactor potential overflow	2025-09-21 22:03:36 +01:00
Ed Addario	fecc472c61	Fix typos in variable names	2025-09-21 17:26:38 +01:00
Ed Addario	e92db008bc	Refactor quantisation checks into its own function	2025-09-21 17:20:48 +01:00
Ed Addario	814f6b66be	Minor general refactoring	2025-09-21 16:45:09 +01:00
Ed Addario	0d5f18303e	Refactor lagrange_penalty()	2025-09-21 16:22:00 +01:00
Ed Addario	9a1656eb97	Refactor pareto optimise and convexify	2025-09-21 16:21:35 +01:00
Ed Addario	1a3e9ea4c8	Refactor estimate_error()	2025-09-21 16:21:00 +01:00
Ed Addario	a7ee915e19	Refactor trimmed_sum()	2025-09-21 16:20:06 +01:00
Ed Addario	b09662f86a	Refactor estimate_lambda()	2025-09-21 16:19:49 +01:00
Ed Addario	17be7615ce	Refactor candidate types build	2025-09-21 16:19:28 +01:00
Ed Addario	08146fd67f	Refactor side_data() and copy_or_broadcast()	2025-09-21 16:19:03 +01:00
Ed Addario	7386d4eadd	Refactor row sampling	2025-09-21 16:18:26 +01:00
Ed Addario	b6c008fd8a	Refactor helper lambdas	2025-09-21 16:04:13 +01:00
Ed Addario	b433fd9547	Refactor last budget pass	2025-09-21 13:43:09 +01:00
Ed Addario	c466c53808	Refactor pareto pruning and convexification	2025-09-21 13:42:54 +01:00
Ed Addario	6b8cedf3bc	Refactor estimate_lambda()	2025-09-21 13:42:31 +01:00
Ed Addario	bdefdb673c	Refactor copy_or_broadcast()	2025-09-21 13:42:07 +01:00
Ed Addario	e8e2aed17a	Refactor row sampling	2025-09-21 13:41:44 +01:00
Ed Addario	9e74f83411	Replace --bpw-bias flag with --no-bias	2025-09-20 23:06:37 +01:00
Ed Addario	ab02bb1f3e	Merge branch 'master' into quantize	2025-09-20 21:41:25 +01:00
Ed Addario	a36946997e	Replace fast_bias() for per slice version and remove precise_bias()	2025-09-20 21:36:54 +01:00
Ed Addario	14fae69a7b	General refactoring	2025-09-20 21:31:31 +01:00
Jie Fu (傅杰)	745cbcf2fe	llama-quant : fix the verification of attention layers for encoder-decoder models (#16023 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-17 09:30:55 +02:00
Ed Addario	ad70fca5b2	Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize	2025-09-15 07:42:37 +01:00
Ed Addario	9b857e3984	Merge branch 'ggml-org:master' into quantize	2025-09-14 23:35:43 +01:00
Ed Addario	c709e1a335	Fix MoE tensor estimation	2025-09-14 22:38:27 +01:00
Ed Addario	8503d59ee4	Increase IQ options	2025-09-13 11:49:18 +01:00
Ed Addario	2b516068e2	"Convexify" candidate list	2025-09-13 09:41:52 +01:00
Ed Addario	12e816b511	Replace greedy allocator with lagrangian relaxation	2025-09-13 09:24:23 +01:00
Ed Addario	7d85993f26	Minor refactoring	2025-09-13 08:44:41 +01:00
Ed Addario	4dff85fbe5	Improve precise_lambda() efficiency	2025-09-13 08:41:37 +01:00
Ed Addario	bc8762f27f	Capture surrounding function name	2025-09-13 08:33:22 +01:00
Ed Addario	886536d80a	Increase error type precision	2025-09-13 08:27:23 +01:00
ddh0	df082f5630	nitpick : correct MB to MiB (#15934 ) MB was incorrectly used for 1024 x 1024 bytes instead of MiB	2025-09-11 19:12:34 +02:00
Ed Addario	04c07b3272	Add better control over MSE and directional bias computation	2025-09-10 18:00:56 +01:00
Ed Addario	eab8708244	Minor factoring for efficiency and correctness	2025-08-30 10:14:46 +01:00
Ed Addario	556f6b04fe	Add --precise-lambda option	2025-08-28 16:08:08 +01:00
Ed Addario	66aff8fa1e	Add precise_lambda()	2025-08-28 16:06:42 +01:00
Ed Addario	8df1d00ae4	Add directional scaling	2025-08-28 16:04:28 +01:00
Ed Addario	04946114c9	Refactor epsilon into a function-wide variable	2025-08-28 16:01:03 +01:00
Ed Addario	4286690019	Minor comment update	2025-08-26 21:39:40 +01:00
Ed Addario	d4ac2106fb	Improve logging and some minor code refactoring	2025-08-24 13:39:10 +01:00
Ed Addario	61c0e01f50	Execute bpw_overrides() only if an imatrix file is provided	2025-08-24 13:36:03 +01:00
Ed Addario	3856d60328	Restrict quant types per family	2025-08-23 14:45:07 +01:00
Ed Addario	decafae270	Adjust bias_lambda	2025-08-23 11:30:11 +01:00
Ed Addario	68ae5e66ce	Improve list of candidate types	2025-08-23 02:50:55 +01:00
Ed Addario	73124a9921	Refactor estimate_error()	2025-08-23 02:17:22 +01:00
Ed Addario	f75265f55b	Fix typo	2025-08-23 01:08:37 +01:00
Ed Addario	9a4b115497	Explicitly adding <atomic> include	2025-08-23 01:08:01 +01:00
Ed Addario	6d17889add	Log if override is from tensor-type or from bpw-target	2025-08-22 16:58:46 +01:00
Ed Addario	fea99d051a	Refactor and combine lambdas	2025-08-22 16:57:58 +01:00
Ed Addario	f05c8483d8	Improve dequantized_buffer fill	2025-08-22 09:17:58 +01:00
Ed Addario	897decbe8a	Show skipped IQ tensors	2025-08-22 09:15:11 +01:00
Ed Addario	01c927fb94	Improve pareto efficient candidate selection	2025-08-22 09:14:14 +01:00
Ed Addario	47cdbe2155	Reduce sampling window to speedup process	2025-08-22 09:11:11 +01:00
Ed Addario	2f13fee795	Parameterise type	2025-08-22 09:05:55 +01:00
Ed Addario	bb0d912c1f	Update comments	2025-08-22 09:02:56 +01:00
Ed Addario	35c1504441	Fix byte count for 3d or higher tensors	2025-08-22 09:01:57 +01:00
Ed Addario	ec0afbe79f	Include embeddings and output tensors	2025-08-22 01:46:09 +01:00
Ed Addario	5b6f1e9fde	General code refactor	2025-08-21 19:18:54 +01:00
Ed Addario	9e11f82e8f	Precompute error denominator in estimate_erro()	2025-08-21 16:25:31 +01:00
Ed Addario	887490c5ec	Dequantise sampled rows only	2025-08-21 15:11:49 +01:00
Ed Addario	e01dad886b	Parallelise candidate evaluation	2025-08-21 12:47:13 +01:00
Ed Addario	95b2ab2800	Change error estimate to use normalised weighted MSE	2025-08-21 10:46:37 +01:00
Ed Addario	5ef493ea1a	Exclude embeddings and output tensor	2025-08-21 09:48:29 +01:00
Ed Addario	35ad0fc4ad	Improve error estimation using weighted MSE	2025-08-20 23:27:20 +01:00
Ed Addario	b0b33b7ccb	Optimise tensor sampling	2025-08-20 20:58:26 +01:00
Ed Addario	3f0118d602	Fix bias lambda bug	2025-08-20 17:26:37 +01:00
Ed Addario	52da4a4f8c	Skip if output.weight or type is COPY	2025-08-20 17:26:05 +01:00
Ed Addario	43caadf783	Add better fallbacks for IQ mixes	2025-08-20 17:24:48 +01:00
Ed Addario	29b2dc3ec0	Do not mix K and IQ quants	2025-08-20 13:27:01 +01:00
Ed Addario	5cd69a6809	Add F16/BF16 type	2025-08-20 09:41:39 +01:00
Ed Addario	936294f6af	Increase precision for error calculation	2025-08-19 23:31:22 +01:00
Ed Addario	f22b3097eb	Avoid division by zero if truncation occurs	2025-08-19 22:34:01 +01:00
Ed Addario	ee05d6bc0b	Update comments	2025-08-19 22:32:53 +01:00
Ed Addario	5aceb9e3ae	Refactor variable names	2025-08-19 22:29:27 +01:00
Ed Addario	1187f6aa9e	Implement bpw_overrides call	2025-08-19 11:07:03 +01:00
Ed Addario	92f49ab399	Add target_bpw_type() logic	2025-08-19 11:05:01 +01:00
Ed Addario	017945a3b2	Validate if imatrix contains activations	2025-08-19 11:03:52 +01:00
Ed Addario	9adae08789	Add is_iq()	2025-08-19 11:00:50 +01:00
Ed Addario	c96b8eef94	Add fallback_type enum	2025-08-19 11:00:05 +01:00
Ed Addario	a22a9deeee	Refactor variable and add target_bpw	2025-08-19 10:57:44 +01:00
Xuan-Son Nguyen	50aa938901	convert : support non-mxfp4 HF model (#15153 ) * convert : support non-mxfp4 HF model * rm redundant check * disable debug check	2025-08-07 23:26:03 +02:00
Georgi Gerganov	fd1234cb46	llama : add gpt-oss (#15091 ) * oai moe * compat with new checkpoint * add attn sink impl * add rope scaling yarn * logits match with latest transformers code * wip chat template * rm trailing space * use ggml_scale_bias * rm redundant is_swa_all * convert interleaved gate_up * graph : fix activation function to match reference (#7) * vocab : handle o200k_harmony special tokens * ggml : add attention sinks support (#1) * llama : add attn sinks * ggml : add attn sinks * cuda : add attn sinks * vulkan : add support for sinks in softmax remove unnecessary return * ggml : add fused swiglu_oai op (#11) * ggml : add fused swiglu_oai op * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update CUDA impl * cont : metal impl * add vulkan impl * test-backend-ops : more test cases, clean up * llama : remove unfused impl * remove extra lines --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com> * repack mxfp4 upon conversion * clean up a bit * enable thinking * add quick hack to render only some special tokens * fix bf16 conversion * remove vocab hack * webui ok * support chat parsing for gpt-oss * fix webui * direct mapping mxfp4, FINALLY * force using mxfp4 * properly use lazy tensor * ggml : add mxfp4 ggml : use e8m0 conversion instead of powf Co-authored-by: Diego Devesa <slarengh@gmail.com> change kvalues_mxfp4 table to match e2m1 (#6) metal : remove quantization for now (not used) cuda : fix disabled CUDA graphs due to ffn moe bias vulkan : add support for mxfp4 cont : add cm2 dequant * ggml : add ggml_add_id (#13) * ggml : add ggml_add_id * add cuda impl * llama : add weight support check for add_id * perf opt * add vulkan impl * rename cuda files * add metal impl * allow in-place ggml_add_id * llama : keep biases on CPU with --cpu-moe * llama : fix compile error ggml-ci * cuda : add fallback for __nv_cvt_e8m0_to_bf16raw ggml-ci * cleanup ggml-ci * sycl : fix supports_op for MXFP4 ggml-ci * fix Unknown reasoning format * ggml-cpu : fix AVX build ggml-ci * fix hip build ggml-ci * cuda : add mxfp4 dequantization support for cuBLAS ggml-ci * ggml-cpu : fix mxfp4 fallback definitions for some architectures ggml-ci * cuda : fix version required for __nv_cvt_e8m0_to_bf16raw --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: slaren <slarengh@gmail.com>	2025-08-05 22:10:36 +03:00
Ed Addario	daf2dd7880	quantize : skip tensor override when in fallback mode (#14995 )	2025-07-31 21:32:18 +02:00
Ed Addario	982e347255	quantize : fix minor logic flaw in --tensor-type (#14572 )	2025-07-13 18:02:17 +02:00
Tarek Dakhran	f5e96b368f	model : support LiquidAI LFM2 hybrid family (#14620 ) Important LFM2 was [merged ](https://github.com/huggingface/transformers/pull/39340)into transformers, but has not yet been released. To convert into gguf, install transformers from source ```shell pip install "transformers @ git+https://github.com/huggingface/transformers.git@main" ```	2025-07-11 20:27:01 +02:00
Xuan-Son Nguyen	8846aace49	model : gemma3n text-only (#14400 ) * gemma3n * add llm_graph_input_one	2025-06-26 20:34:02 +03:00
Ed Addario	fa4a9f2a1c	quantize : handle user-defined pruning of whole layers (blocks) (#13037 )	2025-06-22 23:16:26 +02:00
Ed Addario	30e5b01de2	quantize : change int to unsigned int for KV overrides (#14197 )	2025-06-15 18:53:45 +02:00
Ed Addario	e5c834f718	quantize : improve tensor-type pattern matching (#13033 )	2025-05-13 19:12:31 +02:00

1 2 3 4

161 Commits