llama.cpp

Commit Graph

Author	SHA1	Message	Date
Ed Addario	66aff8fa1e	Add precise_lambda()	2025-08-28 16:06:42 +01:00
Ed Addario	8df1d00ae4	Add directional scaling	2025-08-28 16:04:28 +01:00
Ed Addario	04946114c9	Refactor epsilon into a function-wide variable	2025-08-28 16:01:03 +01:00
Ed Addario	4286690019	Minor comment update	2025-08-26 21:39:40 +01:00
Ed Addario	ccaab24441	Merge branch 'master' into quantize	2025-08-24 20:47:53 +01:00
Ed Addario	d4ac2106fb	Improve logging and some minor code refactoring	2025-08-24 13:39:10 +01:00
Ed Addario	61c0e01f50	Execute bpw_overrides() only if an imatrix file is provided	2025-08-24 13:36:03 +01:00
Georgi Gerganov	b730706a49	kv-cache : support layer reuse (#15504 ) * kv-cache : support layer reuse ggml-ci * cont : update comments [no ci]	2025-08-24 13:07:07 +03:00
Ed Addario	3856d60328	Restrict quant types per family	2025-08-23 14:45:07 +01:00
Piotr Wilkin (ilintar)	b1afcab804	model : add support for Seed-OSS (#15490 ) * First draft * Fix linter errors * Added missing sinks nullptr * Don't forget the llama-arch! * We're through to the generation stage. * Fix post-attention norm * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Fix RoPE type * Fix tensor name and reorder llm_types * Update gguf-py/gguf/constants.py Remove nonexistent FFN_POST_NORM tensor Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-model.h Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add basic chat template * Add chat template tests * Remake chat template test * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-chat.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Reorder llm type descriptions * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-08-23 15:21:52 +02:00
Ed Addario	decafae270	Adjust bias_lambda	2025-08-23 11:30:11 +01:00
LaffeyNyaa	21dc4ddaf2	chat : fix debug build assertion in trim function (#15520 )	2025-08-23 10:38:30 +02:00
Ed Addario	68ae5e66ce	Improve list of candidate types	2025-08-23 02:50:55 +01:00
Ed Addario	73124a9921	Refactor estimate_error()	2025-08-23 02:17:22 +01:00
Ed Addario	f75265f55b	Fix typo	2025-08-23 01:08:37 +01:00
Ed Addario	9a4b115497	Explicitly adding <atomic> include	2025-08-23 01:08:01 +01:00
Ed Addario	6d17889add	Log if override is from tensor-type or from bpw-target	2025-08-22 16:58:46 +01:00
Ed Addario	fea99d051a	Refactor and combine lambdas	2025-08-22 16:57:58 +01:00
Georgi Gerganov	9ebebef62f	llama : remove KV cache defragmentation logic (#15473 ) ggml-ci	2025-08-22 12:22:13 +03:00
Ed Addario	f05c8483d8	Improve dequantized_buffer fill	2025-08-22 09:17:58 +01:00
Ed Addario	897decbe8a	Show skipped IQ tensors	2025-08-22 09:15:11 +01:00
Ed Addario	01c927fb94	Improve pareto efficient candidate selection	2025-08-22 09:14:14 +01:00
Ed Addario	47cdbe2155	Reduce sampling window to speedup process	2025-08-22 09:11:11 +01:00
Ed Addario	2f13fee795	Parameterise type	2025-08-22 09:05:55 +01:00
Ed Addario	bb0d912c1f	Update comments	2025-08-22 09:02:56 +01:00
Ed Addario	35c1504441	Fix byte count for 3d or higher tensors	2025-08-22 09:01:57 +01:00
Tarek Dakhran	e288693669	readme : model : mtdm : lfm2 improvements (#15476 ) * Support untied embeddings * Increase number of image tokens to 1024 * Add LFM2-VL to readme * Actually use untied embeddings	2025-08-22 09:29:08 +02:00
Ed Addario	ec0afbe79f	Include embeddings and output tensors	2025-08-22 01:46:09 +01:00
Ed Addario	e6eefa68f1	Merge branch 'master' into quantize	2025-08-21 19:22:24 +01:00
Ed Addario	5b6f1e9fde	General code refactor	2025-08-21 19:18:54 +01:00
Georgi Gerganov	cd36b5e5c7	llama : remove deprecated llama_kv_self API (#15472 ) ggml-ci	2025-08-21 19:13:45 +03:00
Georgi Gerganov	3f196be84b	graph : remove build_attn_with_sinks overload (#15469 ) ggml-ci	2025-08-21 18:44:45 +03:00
Ed Addario	9e11f82e8f	Precompute error denominator in estimate_erro()	2025-08-21 16:25:31 +01:00
Ed Addario	887490c5ec	Dequantise sampled rows only	2025-08-21 15:11:49 +01:00
Georgi Gerganov	715a6db02c	kv-cache : drop the "unified" prefix (#15467 ) * kv-cache : drop the "unified" prefix ggml-ci * cont : fix comment [no ci]	2025-08-21 17:00:33 +03:00
Ed Addario	e01dad886b	Parallelise candidate evaluation	2025-08-21 12:47:13 +01:00
Ed Addario	95b2ab2800	Change error estimate to use normalised weighted MSE	2025-08-21 10:46:37 +01:00
Ed Addario	5ef493ea1a	Exclude embeddings and output tensor	2025-08-21 09:48:29 +01:00
Ed Addario	35ad0fc4ad	Improve error estimation using weighted MSE	2025-08-20 23:27:20 +01:00
Ed Addario	b0b33b7ccb	Optimise tensor sampling	2025-08-20 20:58:26 +01:00
Ed Addario	3f0118d602	Fix bias lambda bug	2025-08-20 17:26:37 +01:00
Ed Addario	52da4a4f8c	Skip if output.weight or type is COPY	2025-08-20 17:26:05 +01:00
Ed Addario	43caadf783	Add better fallbacks for IQ mixes	2025-08-20 17:24:48 +01:00
Ed Addario	29b2dc3ec0	Do not mix K and IQ quants	2025-08-20 13:27:01 +01:00
Ed Addario	5cd69a6809	Add F16/BF16 type	2025-08-20 09:41:39 +01:00
Ed Addario	b33abae231	Merge branch 'master' into quantize	2025-08-19 23:39:07 +01:00
Ed Addario	936294f6af	Increase precision for error calculation	2025-08-19 23:31:22 +01:00
Ed Addario	f22b3097eb	Avoid division by zero if truncation occurs	2025-08-19 22:34:01 +01:00
Ed Addario	ee05d6bc0b	Update comments	2025-08-19 22:32:53 +01:00
Ed Addario	5aceb9e3ae	Refactor variable names	2025-08-19 22:29:27 +01:00

1 2 3 4 5 ...

608 Commits