ddh0
6934780669
optimize
2025-12-14 16:26:15 -06:00
ddh0
36b526d768
Merge branch 'master' into power-law-sampler
2025-12-14 15:43:49 -06:00
Xuan-Son Nguyen
52392291b2
preset: handle negated arg, reverse the meaning if needed ( #18041 )
2025-12-14 22:08:10 +01:00
Sigbjørn Skjæret
5c8a717128
convert : refactor rope scaling handling ( #18013 )
...
* refactor rope scaling handling
* ws--
* missed a couple
* use find_hparam
2025-12-14 16:04:37 +01:00
Haowei Wu
37f5a1093b
mtmd: enhance image resizing in llava_uhd ( #18014 )
2025-12-14 15:57:52 +01:00
Ruben Ortlam
9e6649ecf2
vulkan: fix mul_mat_vec_iq1_s formatting ( #18026 )
2025-12-14 14:52:46 +01:00
Xuan-Son Nguyen
0759b09c90
graph: add f_attn_temp_offset ( #18025 )
2025-12-14 13:05:59 +01:00
ddh0
667b70fdac
update default decay
2025-12-14 03:41:28 -06:00
ddh0
ec54fe5f14
no, but does this?
2025-12-14 02:54:14 -06:00
Georgi Gerganov
254098a279
common : refactor common_sampler + grammar logic changes ( #17937 )
...
* common : refactor common_sampler + grammar logic changes
* tests : increase max_tokens to get needed response
* batched : fix uninitialized samplers
2025-12-14 10:11:13 +02:00
Jeff Bolz
3238b1400c
vulkan: Fix data race/hang in scalar/cm1 flash attention ( #17887 )
2025-12-14 09:00:00 +01:00
ddh0
2a3f579d1f
does this fix it?
2025-12-14 01:55:02 -06:00
lovedheart
4722671641
vulkan: improve mul_mat_vec_iq1_s speed ( #17874 )
2025-12-14 08:47:49 +01:00
Eve
d15d177f43
vulkan: faster q6_k matmul ( #17813 )
...
* q6_k faster mul mat
* 8 values
* fix comment
* switch to two at a time
* start ci for .glsl files
2025-12-14 08:29:37 +01:00
Georgi Gerganov
77ad8542bd
model-conversion : cast logits to float32 ( #18009 )
2025-12-14 08:58:13 +02:00
ddh0
9613c48172
with logging
2025-12-14 00:36:59 -06:00
Georgi Gerganov
609a2d0268
models : fix YaRN regression + consolidate logic ( #18006 )
...
* models : fix YaRN regression + consolidate logic
* cont : fix the fix
* cont : remove header
* cont : add header
2025-12-14 08:34:56 +02:00
Georgi Gerganov
a63cbafbbc
ggml : arm repack fix build
2025-12-14 08:33:51 +02:00
Georgi Gerganov
0e59224990
sync : ggml
2025-12-14 08:33:51 +02:00
Georgi Gerganov
71fdcf0616
ggml : arm repack fix build (whisper/0)
2025-12-14 08:33:51 +02:00
Congcong Cai
615655aafe
cmake : set `CMAKE_RUNTIME_OUTPUT_DIRECTORY` for non standalone build (ggml/1394)
...
Some backend depends on CMAKE_RUNTIME_OUTPUT_DIRECTORY to create temporary file like metal backened.
Missing CMAKE_RUNTIME_OUTPUT_DIRECTORY will cause some cmake error like permission denied (try to copy file to root).
This PR wants to setup a default path for CMAKE_RUNTIME_OUTPUT_DIRECTORY when it does not exist.
2025-12-14 08:33:51 +02:00
ddh0
d1e5c60442
add missing values to `common_params_sampling::print()`
2025-12-13 23:26:03 -06:00
ddh0
965bcc9dc4
fix leftover `window_size`
2025-12-13 22:19:15 -06:00
ddh0
b8a9626a73
oops forgot args.cpp
2025-12-13 22:17:08 -06:00
ddh0
a96ddd743a
re-write + change parameters + simplify
2025-12-13 22:15:03 -06:00
ddh0
67a733670e
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-13 17:27:35 -06:00
Xuan-Son Nguyen
c00ff929dc
scripts: add script to compare logprobs of llama.cpp against other frameworks ( #17947 )
...
* scripts: add script to compare logits of llama.cpp against other frameworks
* accept custom prompt file
* fix code style
* clarify endpoint
* fix displaying
* use abs for diff
* fix vllm case
* rm output file
* rename to compare-logprobs
* add "pattern"
2025-12-13 22:33:29 +01:00
Sergey Fedorov
4ed2bae50d
server-models.cpp: add missing <filesystem> ( #18000 )
...
Fixes: https://github.com/ggml-org/llama.cpp/issues/17999
2025-12-13 22:02:43 +01:00
Jeff Bolz
5266379bca
llama_context: synchronize before reallocating output buffer ( #17974 )
2025-12-13 09:19:51 -06:00
Xuan-Son Nguyen
4d5ae24c0a
arg: fix common_params_parse not accepting negated arg ( #17991 )
2025-12-13 12:53:37 +01:00
Gustavo Rocha Dias
66ba51252e
cmake: correct scope - link ws2_32 for MinGW/w64devkit builds in cpp-httplib ( #17972 )
...
* fix - w64devkit build
* fix - w64devkit build private scope
2025-12-13 12:46:36 +01:00
Jeff Bolz
36255a2268
vulkan: support get_rows for i32 ( #17941 )
2025-12-13 10:12:53 +01:00
Jeff Bolz
3229a23fa6
vulkan: support GGML_OP_DIAG ( #17893 )
2025-12-13 10:07:49 +01:00
Jeff Bolz
303f8615e9
vulkan: Multi-pass softmax for large number of cols ( #17892 )
...
When the number of cols is large, split each row across multiple workgroups.
There are three phases that communicate partial results through temp buffers:
(1) compute max partials
(2) take max of partials, compute sum(exp(x-max)) partials
(3) sum partials, compute scaled result
2025-12-13 10:04:29 +01:00
Georgi Gerganov
3c6391e748
speculative-simple : free batch on exit ( #17985 )
2025-12-13 09:48:34 +02:00
Sigbjørn Skjæret
8e4d678528
common : skip model validation when --completion-bash is requested ( #17975 )
2025-12-13 08:40:50 +01:00
Jeff Bolz
07a10c1090
vulkan: Allow non-pow2 n_experts in topk_moe ( #17872 )
2025-12-13 08:40:04 +01:00
Sigbjørn Skjæret
2bc94e7928
add llama-completion to completion-bash executables ( #17976 )
2025-12-13 08:35:50 +01:00
Daniel Bevenius
fd1085ffb7
model-conversion : use CONVERTED_MODEL value for converted model [no ci] ( #17984 )
...
* model-conversion : use CONVERTED_MODEL value for converted model [no ci]
This commit updates the model verification scripts to use the
CONVERTED_MODEL environment variable instead of using the MODEL_PATH
(the original model path) as the basis for the converted model file
name.
The motivation for this that currently if the converted model file name
differs from the original model directory/name the verification scripts
will look for the wrong .bin files that were generating when running the
models.
For example, the following steps were not possible:
```console
(venv) $ huggingface-cli download google/gemma-3-270m-it --local-dir ggml-org/gemma-3-270m
(venv) $ python3 convert_hf_to_gguf.py ggml-org/gemma-3-270m --outfile test-bf16.gguf --outtype bf16
(venv) $ cd examples/model-conversion/
(venv) $ export MODEL_PATH=../../ggml-org/gemma-3-270m
(venv) $ export CONVERTED_MODEL=../../test-bf16.gguf
(venv) $ make causal-verify-logits
...
Data saved to data/llamacpp-test-bf16.bin
Data saved to data/llamacpp-test-bf16.txt
Error: llama.cpp logits file not found: data/llamacpp-gemma-3-270m.bin
Please run scripts/run-converted-model.sh first to generate this file.
make: *** [Makefile:62: causal-verify-logits] Error 1
```
With the changes in this commit, the above steps will now work as
expected.
2025-12-13 08:34:26 +01:00
ddh0
1879fc6dc6
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-13 01:17:53 -06:00
ddh0
824bb3aa6e
fix compiler warning, add commented-out logging per token
2025-12-13 00:23:15 -06:00
ddh0
0a19a3fd6c
remove old debug log, style nit
2025-12-12 23:45:45 -06:00
ddh0
94cb883ed9
copy from author
...
ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069
2025-12-12 23:19:08 -06:00
ddh0
53380c183f
add missing parameters in `server-task.cpp`
2025-12-12 22:39:51 -06:00
Xuan-Son Nguyen
380b4c984e
common: support negated args ( #17919 )
...
* args: support negated args
* update docs
* fix typo
* add more neg options
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* rm duplicated arg
* fix LLAMA_ARG_NO_HOST
* add test
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-12 23:58:53 +01:00
Xuan-Son Nguyen
e39a2ce66d
clip: move model cgraphs into their own files ( #17965 )
...
* clip: move model cgraphs into their own files
* more explicit enums
* fix linux build
* fix naming
* missing headers
* nits: add comments for contributors
2025-12-12 21:14:48 +01:00
jiahao su
a8c7f33d79
ci : change the cann version and the container pull method ( #17953 )
...
fix error format
Update build.yml
Remove unnecessary zip files
fix
update
2025-12-12 20:43:00 +01:00
Sigbjørn Skjæret
b7f5f46e03
docker : include legacy llama-completion binary ( #17964 )
2025-12-12 19:39:23 +01:00
Johannes Gäßler
482211438d
CUDA: fix overflow in MMA kernel without stream-k ( #17939 )
2025-12-12 17:43:58 +01:00
Georgi Gerganov
7bed317f53
models : fix the attn_factor for mistral3 graphs + improve consistency ( #17945 )
...
* models : fix the attn_factor for mistral3 graphs
* cont : rework attn_factor correction logic
* cont : make deepseek2 consistent
* cont : add TODO
* cont : special-case DSv2
* cont : revert Mistral 3 Large changes
* cont : fix DS2 to use the original attn_factor
* cont : minor comments
2025-12-12 17:12:40 +02:00