Commit Graph

7244 Commits

Author SHA1 Message Date
Ed Addario b0b33b7ccb
Optimise tensor sampling 2025-08-20 20:58:26 +01:00
Ed Addario 3f0118d602
Fix bias lambda bug 2025-08-20 17:26:37 +01:00
Ed Addario 52da4a4f8c
Skip if output.weight or type is COPY 2025-08-20 17:26:05 +01:00
Ed Addario 43caadf783
Add better fallbacks for IQ mixes 2025-08-20 17:24:48 +01:00
Johannes Gäßler 7a6e91ad26
CUDA: replace GGML_CUDA_F16 with CUDA arch checks (#15433) 2025-08-20 16:58:49 +02:00
Jeff Bolz fec9519802
vulkan: shorten pipeline name strings (#15431)
These detailed strings were causing increased build time on gcc.
2025-08-20 16:33:14 +02:00
Ed Addario 29b2dc3ec0
Do not mix K and IQ quants 2025-08-20 13:27:01 +01:00
Daniel Bevenius 657b8a77bd
chat: handle gpt-oss return/end token inconsistency (#15421)
This commit addresses an inconsistency during inference by adding a new
member to the `templates_params` struct to indicate whether the chat is
in inference mode. This allows the gpt-oss specific function
`common_chat_params_init_gpt_oss` to check this flag and the
`add_generation_prompt` flag to determine if it should replace the
`<|return|>` token with the `<|end|>` token in the prompt.

The motivation for this change is to ensure that the formatted prompt of
past messages in `common_chat_format_single` matches the output of the
formatted new message. The issue is that the gpt-oss template returns
different end tags: `<|return|>` when `add_generation_prompt` is false,
and `<|end|>` when `add_generation_prompt` is true. This causes the
substring function to start at an incorrect position, resulting in
tokenization starting with 'tart|>' instead of '<|start|>'.

Resolves: https://github.com/ggml-org/llama.cpp/issues/15417
2025-08-20 14:26:01 +02:00
Ed Addario 69586e212e
Add F16/BF16 type 2025-08-20 13:23:11 +01:00
Jie Fu (傅杰) ec5ab1a36c
common : fix context shift help message (#15448)
Signed-off-by: Jie Fu <jiefu@tencent.com>
2025-08-20 13:33:30 +03:00
xiaobing318 1a99c2d948
cmake : fix target include directories (#15450)
* Update docker.yml

修改docker.yml文件中的内容使其停止周期性的运行该workflow,如果想要运行该workflow可以手动启动

* feat:Modify the header file include path

1. There's no llava directory in the tools directory.
2. Because the command `target_include_directories(mtmd PUBLIC .)` is used in the `mtmd` CMakeLists.txt file, other targets that link against `mtmd` automatically include the `mtmd` directory as a search path for header files. Therefore, you can remove `target_include_directories(${TARGET} PRIVATE ../llava`` or use `target_include_directories(${TARGET} PRIVATE ../mtmd`` to explicitly require the `llama-server` target to use header files from `mtmd`.

* Restore the docker.yml file
2025-08-20 13:32:05 +03:00
Daniel Bevenius 37f10f955f
make : remove make in favor of CMake (#15449)
This commit removes the content from the Makefile and updates the
current deprecation message to information that `make` has been
replaced by CMake instead.

The message when `make` is invoked will now be the following:
```console
$ make
Makefile:6: *** Build system changed:
 The Makefile build has been replaced by CMake.

 For build instructions see:
 https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md

.  Stop.
```

The motivation for this is that many, if not all targets fail to build
now, after changes to the system, and `make` has also been deprected for
some time now.
2025-08-20 13:31:16 +03:00
Georgi Gerganov 2f37014073
lookahead : add sample command to readme (#15447)
* lookahead : add sample command to readme

* cont : build-agnostic command
2025-08-20 13:30:46 +03:00
Ed Addario 5cd69a6809
Add F16/BF16 type 2025-08-20 09:41:39 +01:00
R0CKSTAR a094f38143
musa: fix build warnings (#15258)
* musa: fix build warnings

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

* fix warning: comparison of integers of different signs: 'const int' and 'unsigned int' [-Wsign-compare]

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

---------

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-08-20 10:17:37 +08:00
Ed Addario b33abae231
Merge branch 'master' into quantize 2025-08-19 23:39:07 +01:00
Ed Addario 936294f6af
Increase precision for error calculation 2025-08-19 23:31:22 +01:00
Ed Addario f22b3097eb
Avoid division by zero if truncation occurs 2025-08-19 22:34:01 +01:00
Ed Addario ee05d6bc0b
Update comments 2025-08-19 22:32:53 +01:00
Ed Addario 5aceb9e3ae
Refactor variable names 2025-08-19 22:29:27 +01:00
lhez fb22dd07a6
opencl: mark `argsort` unsupported if cols exceed workgroup limit (#15375) 2025-08-19 11:25:51 -07:00
Georgi Gerganov 9ef6b0b835
model : add gpt-oss type strings (#15424) 2025-08-19 19:58:28 +03:00
Gian-Carlo Pascutto 1e19f5d462
common : Add top-nsigma sampler to help globally (#15428)
Fixes #15423.
2025-08-19 19:58:14 +03:00
Georgi Gerganov d2fcd91cf9
server : disable context shift by default (#15416)
* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local
2025-08-19 16:46:37 +03:00
SHUAI YANG a6d3cfe7fa
CANN: optimize rope operator (#15335)
* optimize rope ops

* amendment

* delete trailing whitespace

* change the variable name
2025-08-19 21:28:22 +08:00
R0CKSTAR 67f09a3a27
musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 (#15413)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-08-19 12:33:47 +02:00
Ed Addario 1187f6aa9e
Implement bpw_overrides call 2025-08-19 11:07:03 +01:00
Ed Addario 92f49ab399
Add target_bpw_type() logic 2025-08-19 11:05:01 +01:00
Ed Addario 017945a3b2
Validate if imatrix contains activations 2025-08-19 11:03:52 +01:00
Ed Addario 9adae08789
Add is_iq() 2025-08-19 11:00:50 +01:00
Ed Addario c96b8eef94
Add fallback_type enum 2025-08-19 11:00:05 +01:00
Ed Addario a22a9deeee
Refactor variable and add target_bpw 2025-08-19 10:57:44 +01:00
Ed Addario 1b3d5b5744
Populate params 2025-08-19 10:56:02 +01:00
Ed Addario e877474458
Process target_bpw parameter 2025-08-19 10:54:02 +01:00
Ed Addario 0edbf0c176
Process activations 2025-08-19 10:51:58 +01:00
Ed Addario 77b818c040
Populate activations_data with imatrix activations if present 2025-08-19 10:50:37 +01:00
Ed Addario e6d55dc47b
Load activations 2025-08-19 10:49:01 +01:00
Ed Addario 5e85fb3ff3
Add parse_target_bpw() 2025-08-19 10:46:36 +01:00
Ed Addario cfec4048ab
Update usage 2025-08-19 10:43:51 +01:00
Ed Addario 4d9491141b
Add target_bpw parameter 2025-08-19 10:43:21 +01:00
Marvin Gießing 6424594c56
ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware (#15385)
* Added VSX intrinsics for Power9+ systems

Signed-off-by: mgiessing <marvin.giessing@gmail.com>

* Manual unrolling for minor perf improvement

Signed-off-by: mgiessing <marvin.giessing@gmail.com>

* Update ggml/src/ggml-cpu/arch/powerpc/quants.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Signed-off-by: mgiessing <marvin.giessing@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-08-19 11:54:31 +03:00
Ed Addario ba7335efb3
Refactor variable name 2025-08-19 09:54:29 +01:00
Xuan-Son Nguyen e9288e8869
chat : clarify the meaning of reasoning_format (#15408)
* chat : clarify the meaning of reasoning_format

* add link to this PR
2025-08-19 10:29:36 +02:00
Georgi Gerganov 9d262f4bad
server : remove swa_full warning (#15399) 2025-08-19 08:45:26 +03:00
Georgi Gerganov f0d3c7405c
batched-bench : use rand tokens (#15398) 2025-08-19 08:45:12 +03:00
Xuan-Son Nguyen f08c4c0d8d
mtmd : clean up clip_n_output_tokens (#15391) 2025-08-18 22:53:52 +02:00
Georgi Gerganov 6d7f1117e3 codeowners : remove mmv.* 2025-08-18 22:06:44 +03:00
Georgi Gerganov 60212f1ead sync : ggml 2025-08-18 22:06:44 +03:00
Georgi Gerganov f0c541d315 scripts : update sync scripts 2025-08-18 22:06:44 +03:00
Sigbjørn Skjæret baa9255a45
llama : merge conts and reshapes and remove unnecessary cont (#15380)
* remove unnecessary conts and merge reshapes

* restore necessary conts

* merge more conts and reshapes

* merge even more conts and reshapes
2025-08-18 19:30:17 +02:00