Commit Graph

597 Commits

Author SHA1 Message Date
Ed Addario 61c0e01f50
Execute bpw_overrides() only if an imatrix file is provided 2025-08-24 13:36:03 +01:00
Ed Addario 3856d60328
Restrict quant types per family 2025-08-23 14:45:07 +01:00
Ed Addario decafae270
Adjust bias_lambda 2025-08-23 11:30:11 +01:00
Ed Addario 68ae5e66ce
Improve list of candidate types 2025-08-23 02:50:55 +01:00
Ed Addario 73124a9921
Refactor estimate_error() 2025-08-23 02:17:22 +01:00
Ed Addario f75265f55b
Fix typo 2025-08-23 01:08:37 +01:00
Ed Addario 9a4b115497
Explicitly adding <atomic> include 2025-08-23 01:08:01 +01:00
Ed Addario 6d17889add
Log if override is from tensor-type or from bpw-target 2025-08-22 16:58:46 +01:00
Ed Addario fea99d051a
Refactor and combine lambdas 2025-08-22 16:57:58 +01:00
Ed Addario f05c8483d8
Improve dequantized_buffer fill 2025-08-22 09:17:58 +01:00
Ed Addario 897decbe8a
Show skipped IQ tensors 2025-08-22 09:15:11 +01:00
Ed Addario 01c927fb94
Improve pareto efficient candidate selection 2025-08-22 09:14:14 +01:00
Ed Addario 47cdbe2155
Reduce sampling window to speedup process 2025-08-22 09:11:11 +01:00
Ed Addario 2f13fee795
Parameterise type 2025-08-22 09:05:55 +01:00
Ed Addario bb0d912c1f
Update comments 2025-08-22 09:02:56 +01:00
Ed Addario 35c1504441
Fix byte count for 3d or higher tensors 2025-08-22 09:01:57 +01:00
Ed Addario ec0afbe79f
Include embeddings and output tensors 2025-08-22 01:46:09 +01:00
Ed Addario e6eefa68f1
Merge branch 'master' into quantize 2025-08-21 19:22:24 +01:00
Ed Addario 5b6f1e9fde
General code refactor 2025-08-21 19:18:54 +01:00
Georgi Gerganov cd36b5e5c7
llama : remove deprecated llama_kv_self API (#15472)
ggml-ci
2025-08-21 19:13:45 +03:00
Georgi Gerganov 3f196be84b
graph : remove build_attn_with_sinks overload (#15469)
ggml-ci
2025-08-21 18:44:45 +03:00
Ed Addario 9e11f82e8f
Precompute error denominator in estimate_erro() 2025-08-21 16:25:31 +01:00
Ed Addario 887490c5ec
Dequantise sampled rows only 2025-08-21 15:11:49 +01:00
Georgi Gerganov 715a6db02c
kv-cache : drop the "unified" prefix (#15467)
* kv-cache : drop the "unified" prefix

ggml-ci

* cont : fix comment [no ci]
2025-08-21 17:00:33 +03:00
Ed Addario e01dad886b
Parallelise candidate evaluation 2025-08-21 12:47:13 +01:00
Ed Addario 95b2ab2800
Change error estimate to use normalised weighted MSE 2025-08-21 10:46:37 +01:00
Ed Addario 5ef493ea1a
Exclude embeddings and output tensor 2025-08-21 09:48:29 +01:00
Ed Addario 35ad0fc4ad
Improve error estimation using weighted MSE 2025-08-20 23:27:20 +01:00
Ed Addario b0b33b7ccb
Optimise tensor sampling 2025-08-20 20:58:26 +01:00
Ed Addario 3f0118d602
Fix bias lambda bug 2025-08-20 17:26:37 +01:00
Ed Addario 52da4a4f8c
Skip if output.weight or type is COPY 2025-08-20 17:26:05 +01:00
Ed Addario 43caadf783
Add better fallbacks for IQ mixes 2025-08-20 17:24:48 +01:00
Ed Addario 29b2dc3ec0
Do not mix K and IQ quants 2025-08-20 13:27:01 +01:00
Ed Addario 5cd69a6809
Add F16/BF16 type 2025-08-20 09:41:39 +01:00
Ed Addario b33abae231
Merge branch 'master' into quantize 2025-08-19 23:39:07 +01:00
Ed Addario 936294f6af
Increase precision for error calculation 2025-08-19 23:31:22 +01:00
Ed Addario f22b3097eb
Avoid division by zero if truncation occurs 2025-08-19 22:34:01 +01:00
Ed Addario ee05d6bc0b
Update comments 2025-08-19 22:32:53 +01:00
Ed Addario 5aceb9e3ae
Refactor variable names 2025-08-19 22:29:27 +01:00
Georgi Gerganov 9ef6b0b835
model : add gpt-oss type strings (#15424) 2025-08-19 19:58:28 +03:00
Ed Addario 1187f6aa9e
Implement bpw_overrides call 2025-08-19 11:07:03 +01:00
Ed Addario 92f49ab399
Add target_bpw_type() logic 2025-08-19 11:05:01 +01:00
Ed Addario 017945a3b2
Validate if imatrix contains activations 2025-08-19 11:03:52 +01:00
Ed Addario 9adae08789
Add is_iq() 2025-08-19 11:00:50 +01:00
Ed Addario c96b8eef94
Add fallback_type enum 2025-08-19 11:00:05 +01:00
Ed Addario a22a9deeee
Refactor variable and add target_bpw 2025-08-19 10:57:44 +01:00
Georgi Gerganov 9d262f4bad
server : remove swa_full warning (#15399) 2025-08-19 08:45:26 +03:00
Sigbjørn Skjæret baa9255a45
llama : merge conts and reshapes and remove unnecessary cont (#15380)
* remove unnecessary conts and merge reshapes

* restore necessary conts

* merge more conts and reshapes

* merge even more conts and reshapes
2025-08-18 19:30:17 +02:00
Daniel Bevenius 7a0de96045
llama : add 18-layer model type for Gemma 3-270m (#15319)
This commit adds support for the 18-layer model type in the Gemma3
series, which is the size of the Gemma3-270m model.

The motivation for this commit is was the only change required for
Gemma3-270m to be converted to GGUF format and used with llama.cpp.

Once the model has been converted and uploaded to Huggingface it can be
used like this:
```console
$ ./build/bin/llama-cli -hf ggml-org/gemma-3-270m-GGUF:Q8_0
```
2025-08-14 17:56:26 +02:00
Aldehir Rojas b204a5a234
gpt-oss: implement harmony parsing (#15181)
* model : add harmony parser for gpt-oss

* gpt-oss : fix grammar trigger from causing empty stack

* gpt-oss: tweak the grammar trigger again

* gpt-oss : add support for recipient in role header

* gpt-oss : fix ungrouped tool calls in grammar

* gpt-oss : loosen function name matching during parse

* gpt-oss : clean up workarounds

* gpt-oss : add template tests

* gpt-oss : simulate thinking and tool call tags

* gpt-oss : undo think tags when reasoning_format is none

* gpt-oss : set special tokens back to user defined

* gpt-oss : update openai-gpt-oss template

* server : filter out harmony thought messages

* gpt-oss : simplify parsing
2025-08-14 17:23:11 +03:00