Commit Graph

608 Commits

Author SHA1 Message Date
Ed Addario 66aff8fa1e
Add precise_lambda() 2025-08-28 16:06:42 +01:00
Ed Addario 8df1d00ae4
Add directional scaling 2025-08-28 16:04:28 +01:00
Ed Addario 04946114c9
Refactor epsilon into a function-wide variable 2025-08-28 16:01:03 +01:00
Ed Addario 4286690019
Minor comment update 2025-08-26 21:39:40 +01:00
Ed Addario ccaab24441
Merge branch 'master' into quantize 2025-08-24 20:47:53 +01:00
Ed Addario d4ac2106fb
Improve logging and some minor code refactoring 2025-08-24 13:39:10 +01:00
Ed Addario 61c0e01f50
Execute bpw_overrides() only if an imatrix file is provided 2025-08-24 13:36:03 +01:00
Georgi Gerganov b730706a49
kv-cache : support layer reuse (#15504)
* kv-cache : support layer reuse

ggml-ci

* cont : update comments [no ci]
2025-08-24 13:07:07 +03:00
Ed Addario 3856d60328
Restrict quant types per family 2025-08-23 14:45:07 +01:00
Piotr Wilkin (ilintar) b1afcab804
model : add support for Seed-OSS (#15490)
* First draft

* Fix linter errors

* Added missing sinks nullptr

* Don't forget the llama-arch!

* We're through to the generation stage.

* Fix post-attention norm

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Fix RoPE type

* Fix tensor name and reorder llm_types

* Update gguf-py/gguf/constants.py

Remove nonexistent FFN_POST_NORM tensor

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-model.h

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Add basic chat template

* Add chat template tests

* Remake chat template test

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update src/llama-chat.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Reorder llm type descriptions

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-08-23 15:21:52 +02:00
Ed Addario decafae270
Adjust bias_lambda 2025-08-23 11:30:11 +01:00
LaffeyNyaa 21dc4ddaf2
chat : fix debug build assertion in trim function (#15520) 2025-08-23 10:38:30 +02:00
Ed Addario 68ae5e66ce
Improve list of candidate types 2025-08-23 02:50:55 +01:00
Ed Addario 73124a9921
Refactor estimate_error() 2025-08-23 02:17:22 +01:00
Ed Addario f75265f55b
Fix typo 2025-08-23 01:08:37 +01:00
Ed Addario 9a4b115497
Explicitly adding <atomic> include 2025-08-23 01:08:01 +01:00
Ed Addario 6d17889add
Log if override is from tensor-type or from bpw-target 2025-08-22 16:58:46 +01:00
Ed Addario fea99d051a
Refactor and combine lambdas 2025-08-22 16:57:58 +01:00
Georgi Gerganov 9ebebef62f
llama : remove KV cache defragmentation logic (#15473)
ggml-ci
2025-08-22 12:22:13 +03:00
Ed Addario f05c8483d8
Improve dequantized_buffer fill 2025-08-22 09:17:58 +01:00
Ed Addario 897decbe8a
Show skipped IQ tensors 2025-08-22 09:15:11 +01:00
Ed Addario 01c927fb94
Improve pareto efficient candidate selection 2025-08-22 09:14:14 +01:00
Ed Addario 47cdbe2155
Reduce sampling window to speedup process 2025-08-22 09:11:11 +01:00
Ed Addario 2f13fee795
Parameterise type 2025-08-22 09:05:55 +01:00
Ed Addario bb0d912c1f
Update comments 2025-08-22 09:02:56 +01:00
Ed Addario 35c1504441
Fix byte count for 3d or higher tensors 2025-08-22 09:01:57 +01:00
Tarek Dakhran e288693669
readme : model : mtdm : lfm2 improvements (#15476)
* Support untied embeddings

* Increase number of image tokens to 1024

* Add LFM2-VL to readme

* Actually use untied embeddings
2025-08-22 09:29:08 +02:00
Ed Addario ec0afbe79f
Include embeddings and output tensors 2025-08-22 01:46:09 +01:00
Ed Addario e6eefa68f1
Merge branch 'master' into quantize 2025-08-21 19:22:24 +01:00
Ed Addario 5b6f1e9fde
General code refactor 2025-08-21 19:18:54 +01:00
Georgi Gerganov cd36b5e5c7
llama : remove deprecated llama_kv_self API (#15472)
ggml-ci
2025-08-21 19:13:45 +03:00
Georgi Gerganov 3f196be84b
graph : remove build_attn_with_sinks overload (#15469)
ggml-ci
2025-08-21 18:44:45 +03:00
Ed Addario 9e11f82e8f
Precompute error denominator in estimate_erro() 2025-08-21 16:25:31 +01:00
Ed Addario 887490c5ec
Dequantise sampled rows only 2025-08-21 15:11:49 +01:00
Georgi Gerganov 715a6db02c
kv-cache : drop the "unified" prefix (#15467)
* kv-cache : drop the "unified" prefix

ggml-ci

* cont : fix comment [no ci]
2025-08-21 17:00:33 +03:00
Ed Addario e01dad886b
Parallelise candidate evaluation 2025-08-21 12:47:13 +01:00
Ed Addario 95b2ab2800
Change error estimate to use normalised weighted MSE 2025-08-21 10:46:37 +01:00
Ed Addario 5ef493ea1a
Exclude embeddings and output tensor 2025-08-21 09:48:29 +01:00
Ed Addario 35ad0fc4ad
Improve error estimation using weighted MSE 2025-08-20 23:27:20 +01:00
Ed Addario b0b33b7ccb
Optimise tensor sampling 2025-08-20 20:58:26 +01:00
Ed Addario 3f0118d602
Fix bias lambda bug 2025-08-20 17:26:37 +01:00
Ed Addario 52da4a4f8c
Skip if output.weight or type is COPY 2025-08-20 17:26:05 +01:00
Ed Addario 43caadf783
Add better fallbacks for IQ mixes 2025-08-20 17:24:48 +01:00
Ed Addario 29b2dc3ec0
Do not mix K and IQ quants 2025-08-20 13:27:01 +01:00
Ed Addario 5cd69a6809
Add F16/BF16 type 2025-08-20 09:41:39 +01:00
Ed Addario b33abae231
Merge branch 'master' into quantize 2025-08-19 23:39:07 +01:00
Ed Addario 936294f6af
Increase precision for error calculation 2025-08-19 23:31:22 +01:00
Ed Addario f22b3097eb
Avoid division by zero if truncation occurs 2025-08-19 22:34:01 +01:00
Ed Addario ee05d6bc0b
Update comments 2025-08-19 22:32:53 +01:00
Ed Addario 5aceb9e3ae
Refactor variable names 2025-08-19 22:29:27 +01:00