Ed Addario
61c0e01f50
Execute bpw_overrides() only if an imatrix file is provided
2025-08-24 13:36:03 +01:00
Ed Addario
3856d60328
Restrict quant types per family
2025-08-23 14:45:07 +01:00
Ed Addario
decafae270
Adjust bias_lambda
2025-08-23 11:30:11 +01:00
Ed Addario
68ae5e66ce
Improve list of candidate types
2025-08-23 02:50:55 +01:00
Ed Addario
73124a9921
Refactor estimate_error()
2025-08-23 02:17:22 +01:00
Ed Addario
f75265f55b
Fix typo
2025-08-23 01:08:37 +01:00
Ed Addario
9a4b115497
Explicitly adding <atomic> include
2025-08-23 01:08:01 +01:00
Ed Addario
6d17889add
Log if override is from tensor-type or from bpw-target
2025-08-22 16:58:46 +01:00
Ed Addario
fea99d051a
Refactor and combine lambdas
2025-08-22 16:57:58 +01:00
Ed Addario
f05c8483d8
Improve dequantized_buffer fill
2025-08-22 09:17:58 +01:00
Ed Addario
897decbe8a
Show skipped IQ tensors
2025-08-22 09:15:11 +01:00
Ed Addario
01c927fb94
Improve pareto efficient candidate selection
2025-08-22 09:14:14 +01:00
Ed Addario
47cdbe2155
Reduce sampling window to speedup process
2025-08-22 09:11:11 +01:00
Ed Addario
2f13fee795
Parameterise type
2025-08-22 09:05:55 +01:00
Ed Addario
bb0d912c1f
Update comments
2025-08-22 09:02:56 +01:00
Ed Addario
35c1504441
Fix byte count for 3d or higher tensors
2025-08-22 09:01:57 +01:00
Ed Addario
ec0afbe79f
Include embeddings and output tensors
2025-08-22 01:46:09 +01:00
Ed Addario
e6eefa68f1
Merge branch 'master' into quantize
2025-08-21 19:22:24 +01:00
Ed Addario
5b6f1e9fde
General code refactor
2025-08-21 19:18:54 +01:00
Georgi Gerganov
cd36b5e5c7
llama : remove deprecated llama_kv_self API ( #15472 )
...
ggml-ci
2025-08-21 19:13:45 +03:00
Georgi Gerganov
3f196be84b
graph : remove build_attn_with_sinks overload ( #15469 )
...
ggml-ci
2025-08-21 18:44:45 +03:00
Ed Addario
9e11f82e8f
Precompute error denominator in estimate_erro()
2025-08-21 16:25:31 +01:00
Ed Addario
887490c5ec
Dequantise sampled rows only
2025-08-21 15:11:49 +01:00
Georgi Gerganov
715a6db02c
kv-cache : drop the "unified" prefix ( #15467 )
...
* kv-cache : drop the "unified" prefix
ggml-ci
* cont : fix comment [no ci]
2025-08-21 17:00:33 +03:00
Ed Addario
e01dad886b
Parallelise candidate evaluation
2025-08-21 12:47:13 +01:00
Ed Addario
95b2ab2800
Change error estimate to use normalised weighted MSE
2025-08-21 10:46:37 +01:00
Ed Addario
5ef493ea1a
Exclude embeddings and output tensor
2025-08-21 09:48:29 +01:00
Ed Addario
35ad0fc4ad
Improve error estimation using weighted MSE
2025-08-20 23:27:20 +01:00
Ed Addario
b0b33b7ccb
Optimise tensor sampling
2025-08-20 20:58:26 +01:00
Ed Addario
3f0118d602
Fix bias lambda bug
2025-08-20 17:26:37 +01:00
Ed Addario
52da4a4f8c
Skip if output.weight or type is COPY
2025-08-20 17:26:05 +01:00
Ed Addario
43caadf783
Add better fallbacks for IQ mixes
2025-08-20 17:24:48 +01:00
Ed Addario
29b2dc3ec0
Do not mix K and IQ quants
2025-08-20 13:27:01 +01:00
Ed Addario
5cd69a6809
Add F16/BF16 type
2025-08-20 09:41:39 +01:00
Ed Addario
b33abae231
Merge branch 'master' into quantize
2025-08-19 23:39:07 +01:00
Ed Addario
936294f6af
Increase precision for error calculation
2025-08-19 23:31:22 +01:00
Ed Addario
f22b3097eb
Avoid division by zero if truncation occurs
2025-08-19 22:34:01 +01:00
Ed Addario
ee05d6bc0b
Update comments
2025-08-19 22:32:53 +01:00
Ed Addario
5aceb9e3ae
Refactor variable names
2025-08-19 22:29:27 +01:00
Georgi Gerganov
9ef6b0b835
model : add gpt-oss type strings ( #15424 )
2025-08-19 19:58:28 +03:00
Ed Addario
1187f6aa9e
Implement bpw_overrides call
2025-08-19 11:07:03 +01:00
Ed Addario
92f49ab399
Add target_bpw_type() logic
2025-08-19 11:05:01 +01:00
Ed Addario
017945a3b2
Validate if imatrix contains activations
2025-08-19 11:03:52 +01:00
Ed Addario
9adae08789
Add is_iq()
2025-08-19 11:00:50 +01:00
Ed Addario
c96b8eef94
Add fallback_type enum
2025-08-19 11:00:05 +01:00
Ed Addario
a22a9deeee
Refactor variable and add target_bpw
2025-08-19 10:57:44 +01:00
Georgi Gerganov
9d262f4bad
server : remove swa_full warning ( #15399 )
2025-08-19 08:45:26 +03:00
Sigbjørn Skjæret
baa9255a45
llama : merge conts and reshapes and remove unnecessary cont ( #15380 )
...
* remove unnecessary conts and merge reshapes
* restore necessary conts
* merge more conts and reshapes
* merge even more conts and reshapes
2025-08-18 19:30:17 +02:00
Daniel Bevenius
7a0de96045
llama : add 18-layer model type for Gemma 3-270m ( #15319 )
...
This commit adds support for the 18-layer model type in the Gemma3
series, which is the size of the Gemma3-270m model.
The motivation for this commit is was the only change required for
Gemma3-270m to be converted to GGUF format and used with llama.cpp.
Once the model has been converted and uploaded to Huggingface it can be
used like this:
```console
$ ./build/bin/llama-cli -hf ggml-org/gemma-3-270m-GGUF:Q8_0
```
2025-08-14 17:56:26 +02:00
Aldehir Rojas
b204a5a234
gpt-oss: implement harmony parsing ( #15181 )
...
* model : add harmony parser for gpt-oss
* gpt-oss : fix grammar trigger from causing empty stack
* gpt-oss: tweak the grammar trigger again
* gpt-oss : add support for recipient in role header
* gpt-oss : fix ungrouped tool calls in grammar
* gpt-oss : loosen function name matching during parse
* gpt-oss : clean up workarounds
* gpt-oss : add template tests
* gpt-oss : simulate thinking and tool call tags
* gpt-oss : undo think tags when reasoning_format is none
* gpt-oss : set special tokens back to user defined
* gpt-oss : update openai-gpt-oss template
* server : filter out harmony thought messages
* gpt-oss : simplify parsing
2025-08-14 17:23:11 +03:00