Commit Graph

229 Commits

Author SHA1 Message Date
Ed Addario 5aca2561a1
Merge branch 'master' into imatrix 2025-08-21 22:19:54 +01:00
Michael Giba b108e42904
ci : fix -Werror=return-type in clip.cpp so ci/run.sh can run without issue (#15221)
* Fix -Werror=return-type so ci/run.sh can run

* Update tools/mtmd/clip.cpp

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* Remove false now that we have abort

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-08-21 12:06:46 +02:00
stduhpf 1b0db8f6e0
server : fix webui (#15462)
* Fix webui crash after streaming

* build webui
2025-08-21 08:19:22 +03:00
teo 1bc664a26a
server: fix OpenAI API compatibility for usage statistics in chat streams (#15444) 2025-08-21 00:10:08 +02:00
xiaobing318 1a99c2d948
cmake : fix target include directories (#15450)
* Update docker.yml

修改docker.yml文件中的内容使其停止周期性的运行该workflow,如果想要运行该workflow可以手动启动

* feat:Modify the header file include path

1. There's no llava directory in the tools directory.
2. Because the command `target_include_directories(mtmd PUBLIC .)` is used in the `mtmd` CMakeLists.txt file, other targets that link against `mtmd` automatically include the `mtmd` directory as a search path for header files. Therefore, you can remove `target_include_directories(${TARGET} PRIVATE ../llava`` or use `target_include_directories(${TARGET} PRIVATE ../mtmd`` to explicitly require the `llama-server` target to use header files from `mtmd`.

* Restore the docker.yml file
2025-08-20 13:32:05 +03:00
Georgi Gerganov d2fcd91cf9
server : disable context shift by default (#15416)
* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local
2025-08-19 16:46:37 +03:00
Georgi Gerganov f0d3c7405c
batched-bench : use rand tokens (#15398) 2025-08-19 08:45:12 +03:00
Xuan-Son Nguyen f08c4c0d8d
mtmd : clean up clip_n_output_tokens (#15391) 2025-08-18 22:53:52 +02:00
Sigbjørn Skjæret baa9255a45
llama : merge conts and reshapes and remove unnecessary cont (#15380)
* remove unnecessary conts and merge reshapes

* restore necessary conts

* merge more conts and reshapes

* merge even more conts and reshapes
2025-08-18 19:30:17 +02:00
davidef d1d8241600
server : fix incoming tasks not process in order (#15395) 2025-08-18 17:51:42 +03:00
Oleksandr Kuvshynov e5155e6986
server : export max observed n_past value (#15361)
Add tracking for high watermark cache usage and make it available in /metrics endpoint.

Use-case: Tracking largest needed cache usage under realistic workload
to better understand memory requirements and be able to adjust
cache size/quantization for model/cache accordingly.
2025-08-18 00:28:58 +02:00
Ed Addario 630750fdef
Validate number of elements if in_sum is present 2025-08-17 09:42:18 +01:00
Ed Addario 1f72bc157f
Avoid using if statements with initialiser 2025-08-17 08:35:17 +01:00
Ed Addario f6934b9417
Merge branch 'imatrix' of https://github.com/EAddario/llama.cpp into imatrix 2025-08-17 08:20:18 +01:00
Ed Addario 44ea7ddeac
Change statement order 2025-08-17 08:20:03 +01:00
Ed Addario 2e803234f4
Use { and } around conditionally-executed single line statements 2025-08-17 08:19:02 +01:00
Ed Addario a96013f720
Define one variable per line and refactor names 2025-08-17 08:16:41 +01:00
Ed Addario 12607d3203
Use { and } around single line for statement 2025-08-17 08:10:54 +01:00
Ed Addario d19e6c9afa
Use { and } around the conditionally-executed statement
Co-authored-by: compilade <git@compilade.net>
2025-08-17 08:08:26 +01:00
Ed Addario 97d839c441
Using one line per variable definition
Co-authored-by: compilade <git@compilade.net>
2025-08-17 08:06:15 +01:00
Ed Addario 4a487ea7e4
Use { and } around the conditionally-executed statement
Co-authored-by: compilade <git@compilade.net>
2025-08-17 07:26:16 +01:00
Ed Addario e3149a2168
Use the corresponding size
Co-authored-by: compilade <git@compilade.net>
2025-08-17 07:24:27 +01:00
Tarek Dakhran 65349f26f2
model : support vision LiquidAI LFM2-VL family (#15347)
* wip lfm2 vision model

* Fix conv weight

* Implement dynamic resolution

* Fix cuda

* support LFM2-VL-450M

* happy CI

* Remove extra `ggml_conv` and put others into the right place

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-08-16 23:33:54 +02:00
Ed Addario d4b0d89115
Fix return type bug 2025-08-16 11:00:43 +01:00
Ed Addario 030ec53d7a
Remove unnecessary include 2025-08-16 10:46:09 +01:00
Ed Addario 8589ef4d15
Update README.md 2025-08-15 21:27:48 +01:00
Ed Addario 240a965e50
Update README.md 2025-08-15 21:24:38 +01:00
Ed Addario 42bfe3b2a3
Update stats output sort based on imatrix type 2025-08-15 21:12:56 +01:00
Ed Addario 2756617c3f
Merge branch 'master' into imatrix 2025-08-15 20:46:43 +01:00
Diego Devesa f75b830647
chat : include kwargs in template example (#15309) 2025-08-14 10:28:29 -07:00
Aldehir Rojas b204a5a234
gpt-oss: implement harmony parsing (#15181)
* model : add harmony parser for gpt-oss

* gpt-oss : fix grammar trigger from causing empty stack

* gpt-oss: tweak the grammar trigger again

* gpt-oss : add support for recipient in role header

* gpt-oss : fix ungrouped tool calls in grammar

* gpt-oss : loosen function name matching during parse

* gpt-oss : clean up workarounds

* gpt-oss : add template tests

* gpt-oss : simulate thinking and tool call tags

* gpt-oss : undo think tags when reasoning_format is none

* gpt-oss : set special tokens back to user defined

* gpt-oss : update openai-gpt-oss template

* server : filter out harmony thought messages

* gpt-oss : simplify parsing
2025-08-14 17:23:11 +03:00
Georgi Gerganov d32e03f449
server : add SWA checkpoints (#15293)
* server : add SWA checkpoints

ggml-ci

* cont : server clean-up

* server : handle state restore fails

* llama : add extended llama_state_seq_ API

* server : do not make checkpoints if --swa-full

ggml-ci

* llama : remove flags value for NONE

* server : configure number of SWA checkpoints with CLI arg

ggml-ci

* args : fix scope of new argument
2025-08-14 14:59:50 +03:00
kallewoof 3ea913f1ce
perplexity: give more information about constraints on failure (#15303)
* perplexity: give more information about constraints on failure

This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each.

* log formatting

* log error and return instead of storing max_seq_exceeded int

* check if s0 is zero for -np check
2025-08-14 09:16:32 +03:00
Sigbjørn Skjæret b3e16665e1
server : enable -td and -tbd parameters (#15172) 2025-08-13 15:43:00 +02:00
Copilot d8914fc47e
common : add --override-tensor-draft, --cpu-moe-draft and --n-cpu-moe-draft parameters (#15191)
* Checkpoint from VS Code for coding agent session

* Initial plan

* Fix typo in --override-tensor-draft flag implementation

* Add null termination for speculative tensor buffer overrides

* Apply suggestions from code review

* Apply suggestions from code review

* Extract tensor override parsing logic to common function (addresses @slaren's feedback)

* Apply suggestions from code review

* Apply suggestions

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-08-13 12:44:40 +02:00
Aldehir Rojas e885445bc1
server : filter out harmony thought messages (#15278) 2025-08-13 12:28:21 +02:00
rainred cf9e5648a7
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. (#14750)
* Fix MinicpmV model converter and clip to avoid using hardcode.

* Code update for pr/14750

* Remove unused field, update script path in docs.

* Add version 5 for fallback code.

---------

Co-authored-by: lzhang <zhanglei@modelbest.cn>
2025-08-11 16:12:12 +02:00
Xuan-Son Nguyen 53d0a12658
server : allow specifying reasoning_format in HTTP request (#15238) 2025-08-11 14:48:41 +02:00
Daniel Bevenius 1ebbaddff2
perplexity : update comments/error msg to use decode [no ci] (#15227)
This commit updates comments and error messages to use "decode" instead
of "eval" in perplexity.cpp.

The motivation for this is that `llama_eval` was renamed to
`llama_decode` a while ago, but the comments and error messages
still referred to "eval". This change ensures consistency and clarity.
2025-08-11 11:21:24 +03:00
Ed Addario 89051cda35
Update README.md 2025-08-09 14:49:44 +01:00
Ed Addario dcac206f8e
Add --activation-statistics logic to avoid doubling the imatrix size by default 2025-08-09 14:49:25 +01:00
Ed Addario 6fe51e12f1
Fix typo in ECS formula 2025-08-09 09:12:23 +01:00
Ed Addario 59af5034f7
Update README.md 2025-08-09 01:26:23 +01:00
Ed Addario c5ecdaa1a1
Add Euclidean–Cosine Score (ECS) 2025-08-07 19:04:49 +01:00
Ed Addario 5bb2def02d
Add --activation-statistics parameter 2025-08-07 17:41:21 +01:00
Ed Addario dadd90ef73
Rename report heading 2025-08-07 14:07:48 +01:00
Ed Addario e0d6471340
Reverse conditional logic to match convention 2025-08-07 12:04:52 +01:00
Ed Addario 3e9d53c61e
Refactor variable names 2025-08-07 12:03:24 +01:00
Ed Addario c7959edff5
Merge branch 'master' into imatrix 2025-08-07 11:51:33 +01:00
Daniel Bevenius 36d3f00e14
requirements : fix PyTorch uint64 compatibility (#15134)
This commit addresses an issue with the convert_hf_to_gguf script
which is currently failing with:
```console
AttributeError: module 'torch' has no attribute 'uint64'
```

This occurred because safetensors expects torch.uint64 to be available
in the public API, but PyTorch 2.2.x only provides limited support for
unsigned types beyond uint8 it seems. The torch.uint64 dtype exists but
is not exposed in the standard torch namespace
(see pytorch/pytorch#58734).

PyTorch 2.4.0 properly exposes torch.uint64 in the public API, resolving
the compatibility issue with safetensors. This also required torchvision
to updated to =0.19.0 for compatibility.

Refs: https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/186#68938de803e47d990aa087fb
Refs: https://github.com/pytorch/pytorch/issues/58734
2025-08-07 05:31:48 +02:00