Commit Graph

215 Commits

Author SHA1 Message Date
Ed Addario f6934b9417
Merge branch 'imatrix' of https://github.com/EAddario/llama.cpp into imatrix 2025-08-17 08:20:18 +01:00
Ed Addario 44ea7ddeac
Change statement order 2025-08-17 08:20:03 +01:00
Ed Addario 2e803234f4
Use { and } around conditionally-executed single line statements 2025-08-17 08:19:02 +01:00
Ed Addario a96013f720
Define one variable per line and refactor names 2025-08-17 08:16:41 +01:00
Ed Addario 12607d3203
Use { and } around single line for statement 2025-08-17 08:10:54 +01:00
Ed Addario d19e6c9afa
Use { and } around the conditionally-executed statement
Co-authored-by: compilade <git@compilade.net>
2025-08-17 08:08:26 +01:00
Ed Addario 97d839c441
Using one line per variable definition
Co-authored-by: compilade <git@compilade.net>
2025-08-17 08:06:15 +01:00
Ed Addario 4a487ea7e4
Use { and } around the conditionally-executed statement
Co-authored-by: compilade <git@compilade.net>
2025-08-17 07:26:16 +01:00
Ed Addario e3149a2168
Use the corresponding size
Co-authored-by: compilade <git@compilade.net>
2025-08-17 07:24:27 +01:00
Ed Addario d4b0d89115
Fix return type bug 2025-08-16 11:00:43 +01:00
Ed Addario 030ec53d7a
Remove unnecessary include 2025-08-16 10:46:09 +01:00
Ed Addario 8589ef4d15
Update README.md 2025-08-15 21:27:48 +01:00
Ed Addario 240a965e50
Update README.md 2025-08-15 21:24:38 +01:00
Ed Addario 42bfe3b2a3
Update stats output sort based on imatrix type 2025-08-15 21:12:56 +01:00
Ed Addario 2756617c3f
Merge branch 'master' into imatrix 2025-08-15 20:46:43 +01:00
Diego Devesa f75b830647
chat : include kwargs in template example (#15309) 2025-08-14 10:28:29 -07:00
Aldehir Rojas b204a5a234
gpt-oss: implement harmony parsing (#15181)
* model : add harmony parser for gpt-oss

* gpt-oss : fix grammar trigger from causing empty stack

* gpt-oss: tweak the grammar trigger again

* gpt-oss : add support for recipient in role header

* gpt-oss : fix ungrouped tool calls in grammar

* gpt-oss : loosen function name matching during parse

* gpt-oss : clean up workarounds

* gpt-oss : add template tests

* gpt-oss : simulate thinking and tool call tags

* gpt-oss : undo think tags when reasoning_format is none

* gpt-oss : set special tokens back to user defined

* gpt-oss : update openai-gpt-oss template

* server : filter out harmony thought messages

* gpt-oss : simplify parsing
2025-08-14 17:23:11 +03:00
Georgi Gerganov d32e03f449
server : add SWA checkpoints (#15293)
* server : add SWA checkpoints

ggml-ci

* cont : server clean-up

* server : handle state restore fails

* llama : add extended llama_state_seq_ API

* server : do not make checkpoints if --swa-full

ggml-ci

* llama : remove flags value for NONE

* server : configure number of SWA checkpoints with CLI arg

ggml-ci

* args : fix scope of new argument
2025-08-14 14:59:50 +03:00
kallewoof 3ea913f1ce
perplexity: give more information about constraints on failure (#15303)
* perplexity: give more information about constraints on failure

This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each.

* log formatting

* log error and return instead of storing max_seq_exceeded int

* check if s0 is zero for -np check
2025-08-14 09:16:32 +03:00
Sigbjørn Skjæret b3e16665e1
server : enable -td and -tbd parameters (#15172) 2025-08-13 15:43:00 +02:00
Copilot d8914fc47e
common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-moe-draft parameters (#15191)
* Checkpoint from VS Code for coding agent session

* Initial plan

* Fix typo in --override-tensor-draft flag implementation

* Add null termination for speculative tensor buffer overrides

* Apply suggestions from code review

* Apply suggestions from code review

* Extract tensor override parsing logic to common function (addresses @slaren's feedback)

* Apply suggestions from code review

* Apply suggestions

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-08-13 12:44:40 +02:00
Aldehir Rojas e885445bc1
server : filter out harmony thought messages (#15278) 2025-08-13 12:28:21 +02:00
rainred cf9e5648a7
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. (#14750)
* Fix MinicpmV model converter and clip to avoid using hardcode.

* Code update for pr/14750

* Remove unused field, update script path in docs.

* Add version 5 for fallback code.

---------

Co-authored-by: lzhang <zhanglei@modelbest.cn>
2025-08-11 16:12:12 +02:00
Xuan-Son Nguyen 53d0a12658
server : allow specifying reasoning_format in HTTP request (#15238) 2025-08-11 14:48:41 +02:00
Daniel Bevenius 1ebbaddff2
perplexity : update comments/error msg to use decode [no ci] (#15227)
This commit updates comments and error messages to use "decode" instead
of "eval" in perplexity.cpp.

The motivation for this is that `llama_eval` was renamed to
`llama_decode` a while ago, but the comments and error messages
still referred to "eval". This change ensures consistency and clarity.
2025-08-11 11:21:24 +03:00
Ed Addario 89051cda35
Update README.md 2025-08-09 14:49:44 +01:00
Ed Addario dcac206f8e
Add --activation-statistics logic to avoid doubling the imatrix size by default 2025-08-09 14:49:25 +01:00
Ed Addario 6fe51e12f1
Fix typo in ECS formula 2025-08-09 09:12:23 +01:00
Ed Addario 59af5034f7
Update README.md 2025-08-09 01:26:23 +01:00
Ed Addario c5ecdaa1a1
Add Euclidean–Cosine Score (ECS) 2025-08-07 19:04:49 +01:00
Ed Addario 5bb2def02d
Add --activation-statistics parameter 2025-08-07 17:41:21 +01:00
Ed Addario dadd90ef73
Rename report heading 2025-08-07 14:07:48 +01:00
Ed Addario e0d6471340
Reverse conditional logic to match convention 2025-08-07 12:04:52 +01:00
Ed Addario 3e9d53c61e
Refactor variable names 2025-08-07 12:03:24 +01:00
Ed Addario c7959edff5
Merge branch 'master' into imatrix 2025-08-07 11:51:33 +01:00
Daniel Bevenius 36d3f00e14
requirements : fix PyTorch uint64 compatibility (#15134)
This commit addresses an issue with the convert_hf_to_gguf script
which is currently failing with:
```console
AttributeError: module 'torch' has no attribute 'uint64'
```

This occurred because safetensors expects torch.uint64 to be available
in the public API, but PyTorch 2.2.x only provides limited support for
unsigned types beyond uint8 it seems. The torch.uint64 dtype exists but
is not exposed in the standard torch namespace
(see pytorch/pytorch#58734).

PyTorch 2.4.0 properly exposes torch.uint64 in the public API, resolving
the compatibility issue with safetensors. This also required torchvision
to updated to =0.19.0 for compatibility.

Refs: https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/186#68938de803e47d990aa087fb
Refs: https://github.com/pytorch/pytorch/issues/58734
2025-08-07 05:31:48 +02:00
Juk Armstrong 476aa3fd57
Fixed name `-override-tensors` to `-override-tensor` (#15129) 2025-08-06 17:28:48 +01:00
Ed Addario 030ed3c909
Merge branch 'master' into imatrix 2025-08-05 21:58:00 +01:00
Georgi Gerganov fd1234cb46
llama : add gpt-oss (#15091)
* oai moe

* compat with new checkpoint

* add attn sink impl

* add rope scaling yarn

* logits match with latest transformers code

* wip chat template

* rm trailing space

* use ggml_scale_bias

* rm redundant is_swa_all

* convert interleaved gate_up

* graph : fix activation function to match reference (#7)

* vocab : handle o200k_harmony special tokens

* ggml : add attention sinks support (#1)

* llama : add attn sinks

* ggml : add attn sinks

* cuda : add attn sinks

* vulkan : add support for sinks in softmax

remove unnecessary return

* ggml : add fused swiglu_oai op (#11)

* ggml : add fused swiglu_oai op

* Update ggml/src/ggml-cpu/ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* update CUDA impl

* cont : metal impl

* add vulkan impl

* test-backend-ops : more test cases, clean up

* llama : remove unfused impl

* remove extra lines

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: slaren <slarengh@gmail.com>

* repack mxfp4 upon conversion

* clean up a bit

* enable thinking

* add quick hack to render only some special tokens

* fix bf16 conversion

* remove vocab hack

* webui ok

* support chat parsing for gpt-oss

* fix webui

* direct mapping mxfp4, FINALLY

* force using mxfp4

* properly use lazy tensor

* ggml : add mxfp4

ggml : use e8m0 conversion instead of powf

Co-authored-by: Diego Devesa <slarengh@gmail.com>

change kvalues_mxfp4 table to match e2m1 (#6)

metal : remove quantization for now (not used)

cuda : fix disabled CUDA graphs due to ffn moe bias

vulkan : add support for mxfp4

cont : add cm2 dequant

* ggml : add ggml_add_id (#13)

* ggml : add ggml_add_id

* add cuda impl

* llama : add weight support check for add_id

* perf opt

* add vulkan impl

* rename cuda files

* add metal impl

* allow in-place ggml_add_id

* llama : keep biases on CPU with --cpu-moe

* llama : fix compile error

ggml-ci

* cuda : add fallback for __nv_cvt_e8m0_to_bf16raw

ggml-ci

* cleanup

ggml-ci

* sycl : fix supports_op for MXFP4

ggml-ci

* fix Unknown reasoning format

* ggml-cpu : fix AVX build

ggml-ci

* fix hip build

ggml-ci

* cuda : add mxfp4 dequantization support for cuBLAS

ggml-ci

* ggml-cpu : fix mxfp4 fallback definitions for some architectures

ggml-ci

* cuda : fix version required for __nv_cvt_e8m0_to_bf16raw

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: slaren <slarengh@gmail.com>
2025-08-05 22:10:36 +03:00
Ed Addario 88854c9179
Refactor legacy mode 2025-08-05 14:16:45 +01:00
Ed Addario 4c3fea89d6
Update report layout 2025-08-05 13:32:59 +01:00
Ed Addario 49996a19da
Refactor variable names 2025-08-05 13:32:46 +01:00
Ed Addario aea9b31db5
Make ZD Score two-tailed 2025-08-05 12:57:13 +01:00
Alex Wu 22f060c9c4
webui: fix markdown table (#15081)
* webui: fix markdown table

* webui: fix table display with themes
2025-08-05 13:56:44 +02:00
Ed Addario 906548a00a
Update aggregated sum of squared activations per layer 2025-08-05 12:06:19 +01:00
Ed Addario b37393423d
Compute aggregated (per layer) l2 norm 2025-08-05 08:54:57 +01:00
Ed Addario 5e40cf4f1c
Do not resize if in_sum is null 2025-08-05 00:18:53 +01:00
compilade 19f68fa5a4
imatrix : warn when GGUF imatrix is saved without .gguf suffix (#15076)
* imatrix : add warning when suffix is not .gguf for GGUF imatrix

* imatrix : only warn about suffix when output format is unspecified
2025-08-04 23:26:52 +02:00
Ed Addario adbff66394
Merge branch 'master' into imatrix 2025-08-04 22:16:10 +01:00
Ed Addario c39c4e2a33
Refactor variable name 2025-08-04 22:15:50 +01:00