llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	61c0e01f50	Execute bpw_overrides() only if an imatrix file is provided	2025-08-24 13:36:03 +01:00
Ed Addario	3856d60328	Restrict quant types per family	2025-08-23 14:45:07 +01:00
Ed Addario	decafae270	Adjust bias_lambda	2025-08-23 11:30:11 +01:00
Ed Addario	68ae5e66ce	Improve list of candidate types	2025-08-23 02:50:55 +01:00
Ed Addario	73124a9921	Refactor estimate_error()	2025-08-23 02:17:22 +01:00
Ed Addario	f75265f55b	Fix typo	2025-08-23 01:08:37 +01:00
Ed Addario	9a4b115497	Explicitly adding <atomic> include	2025-08-23 01:08:01 +01:00
Ed Addario	6d17889add	Log if override is from tensor-type or from bpw-target	2025-08-22 16:58:46 +01:00
Ed Addario	fea99d051a	Refactor and combine lambdas	2025-08-22 16:57:58 +01:00
Ed Addario	f05c8483d8	Improve dequantized_buffer fill	2025-08-22 09:17:58 +01:00
Ed Addario	897decbe8a	Show skipped IQ tensors	2025-08-22 09:15:11 +01:00
Ed Addario	01c927fb94	Improve pareto efficient candidate selection	2025-08-22 09:14:14 +01:00
Ed Addario	47cdbe2155	Reduce sampling window to speedup process	2025-08-22 09:11:11 +01:00
Ed Addario	2f13fee795	Parameterise type	2025-08-22 09:05:55 +01:00
Ed Addario	bb0d912c1f	Update comments	2025-08-22 09:02:56 +01:00
Ed Addario	35c1504441	Fix byte count for 3d or higher tensors	2025-08-22 09:01:57 +01:00
Ed Addario	ec0afbe79f	Include embeddings and output tensors	2025-08-22 01:46:09 +01:00
Ed Addario	e6eefa68f1	Merge branch 'master' into quantize	2025-08-21 19:22:24 +01:00
Ed Addario	5b6f1e9fde	General code refactor	2025-08-21 19:18:54 +01:00
Georgi Gerganov	cd36b5e5c7	llama : remove deprecated llama_kv_self API (#15472 ) ggml-ci	2025-08-21 19:13:45 +03:00
Georgi Gerganov	3f196be84b	graph : remove build_attn_with_sinks overload (#15469 ) ggml-ci	2025-08-21 18:44:45 +03:00
Ed Addario	9e11f82e8f	Precompute error denominator in estimate_erro()	2025-08-21 16:25:31 +01:00
Ed Addario	887490c5ec	Dequantise sampled rows only	2025-08-21 15:11:49 +01:00
Georgi Gerganov	715a6db02c	kv-cache : drop the "unified" prefix (#15467 ) * kv-cache : drop the "unified" prefix ggml-ci * cont : fix comment [no ci]	2025-08-21 17:00:33 +03:00
Ed Addario	e01dad886b	Parallelise candidate evaluation	2025-08-21 12:47:13 +01:00
Ed Addario	95b2ab2800	Change error estimate to use normalised weighted MSE	2025-08-21 10:46:37 +01:00
Ed Addario	5ef493ea1a	Exclude embeddings and output tensor	2025-08-21 09:48:29 +01:00
Ed Addario	35ad0fc4ad	Improve error estimation using weighted MSE	2025-08-20 23:27:20 +01:00
Ed Addario	b0b33b7ccb	Optimise tensor sampling	2025-08-20 20:58:26 +01:00
Ed Addario	3f0118d602	Fix bias lambda bug	2025-08-20 17:26:37 +01:00
Ed Addario	52da4a4f8c	Skip if output.weight or type is COPY	2025-08-20 17:26:05 +01:00
Ed Addario	43caadf783	Add better fallbacks for IQ mixes	2025-08-20 17:24:48 +01:00
Ed Addario	29b2dc3ec0	Do not mix K and IQ quants	2025-08-20 13:27:01 +01:00
Ed Addario	5cd69a6809	Add F16/BF16 type	2025-08-20 09:41:39 +01:00
Ed Addario	b33abae231	Merge branch 'master' into quantize	2025-08-19 23:39:07 +01:00
Ed Addario	936294f6af	Increase precision for error calculation	2025-08-19 23:31:22 +01:00
Ed Addario	f22b3097eb	Avoid division by zero if truncation occurs	2025-08-19 22:34:01 +01:00
Ed Addario	ee05d6bc0b	Update comments	2025-08-19 22:32:53 +01:00
Ed Addario	5aceb9e3ae	Refactor variable names	2025-08-19 22:29:27 +01:00
Georgi Gerganov	9ef6b0b835	model : add gpt-oss type strings (#15424 )	2025-08-19 19:58:28 +03:00
Ed Addario	1187f6aa9e	Implement bpw_overrides call	2025-08-19 11:07:03 +01:00
Ed Addario	92f49ab399	Add target_bpw_type() logic	2025-08-19 11:05:01 +01:00
Ed Addario	017945a3b2	Validate if imatrix contains activations	2025-08-19 11:03:52 +01:00
Ed Addario	9adae08789	Add is_iq()	2025-08-19 11:00:50 +01:00
Ed Addario	c96b8eef94	Add fallback_type enum	2025-08-19 11:00:05 +01:00
Ed Addario	a22a9deeee	Refactor variable and add target_bpw	2025-08-19 10:57:44 +01:00
Georgi Gerganov	9d262f4bad	server : remove swa_full warning (#15399 )	2025-08-19 08:45:26 +03:00
Sigbjørn Skjæret	baa9255a45	llama : merge conts and reshapes and remove unnecessary cont (#15380 ) * remove unnecessary conts and merge reshapes * restore necessary conts * merge more conts and reshapes * merge even more conts and reshapes	2025-08-18 19:30:17 +02:00
Daniel Bevenius	7a0de96045	llama : add 18-layer model type for Gemma 3-270m (#15319 ) This commit adds support for the 18-layer model type in the Gemma3 series, which is the size of the Gemma3-270m model. The motivation for this commit is was the only change required for Gemma3-270m to be converted to GGUF format and used with llama.cpp. Once the model has been converted and uploaded to Huggingface it can be used like this: ```console $ ./build/bin/llama-cli -hf ggml-org/gemma-3-270m-GGUF:Q8_0 ```	2025-08-14 17:56:26 +02:00
Aldehir Rojas	b204a5a234	gpt-oss: implement harmony parsing (#15181 ) * model : add harmony parser for gpt-oss * gpt-oss : fix grammar trigger from causing empty stack * gpt-oss: tweak the grammar trigger again * gpt-oss : add support for recipient in role header * gpt-oss : fix ungrouped tool calls in grammar * gpt-oss : loosen function name matching during parse * gpt-oss : clean up workarounds * gpt-oss : add template tests * gpt-oss : simulate thinking and tool call tags * gpt-oss : undo think tags when reasoning_format is none * gpt-oss : set special tokens back to user defined * gpt-oss : update openai-gpt-oss template * server : filter out harmony thought messages * gpt-oss : simplify parsing	2025-08-14 17:23:11 +03:00

1 2 3 4 5 ...

597 Commits