Commit Graph

8043 Commits

Author SHA1 Message Date
ddh0 0301b1db65
Merge branch 'ggml-org:master' into llama-quantize-dry-run 2026-02-12 14:33:16 -06:00
Georgi Gerganov 338085c69e
args : add -kvu to llama-parallel (#19577) 2026-02-12 21:52:41 +02:00
ddh0 75ab2b3d16
Merge branch 'ggml-org:master' into llama-quantize-dry-run 2026-02-12 12:56:28 -06:00
Aleksander Grygier 4c61875bf8
webui: Add switcher to Chat Message UI to show raw LLM output (#19571) 2026-02-12 19:55:51 +01:00
Adrien Gallouët 4b385bfcf8
vendor : update cpp-httplib (#19537)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-12 16:11:22 +01:00
Christian Schmitz f488429380
llama : update outdated comment in llama.h (#19428)
* Updated documentation

Model is no longer a parameter

* llama : fix trailing whitespace in comment

---------

Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2026-02-12 15:52:57 +01:00
Aleksander Grygier 4d688f9ebb
(webui) FEATURE: Enable adding or injecting System Message into chat (#19556)
* feat: Enable adding System Prompt per-chat

* fix: Save draft message in Chat Form when adding System Prompt from new chat view

* fix: Proper system message deletion logic

* chore: Formatting

* chore: update webui build output
2026-02-12 13:56:08 +01:00
Daniel Bevenius ff599039a9
scripts : add support for forks in pr2wt.sh (#19540)
This commit adds support for using the pr2wt.sh (pull request to
workspace) script with forks of upstream llama.cpp.
2026-02-12 13:14:28 +01:00
Aleksander Grygier f486ce9f30
(webui) REFACTOR: UI primitives and polish (#19551)
* webui: UI primitives and polish (non-MCP)

* chore: update webui build output
2026-02-12 12:21:00 +01:00
Aleksander Grygier 38adc7d469
WebUI Architecture Cleanup (#19541)
* webui: architecture foundation (non-MCP core refactors)

* chore: update webui build output
2026-02-12 11:22:27 +01:00
Georgi Gerganov 3b3a948134
metal : update sum_rows kernel to support float4 (#19524) 2026-02-12 11:35:28 +02:00
Mario Limonciello 6845f7f87f
Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (#19461)
There is an upstream problem [1] with AMD's LLVM 22 fork and
rocWMMA 2.2.0 causing compilation issues on devices without
native fp16 support (CDNA devices).

The specialized types aren't resolved properly:
```
/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
```

Add a workaround to explicitly declare the types and cast when
compiling with HIP and ROCWMMA_FATTN [2].  When this is actually
fixed upstream some guards can be used to detect and wrap the
version that has the fix to only apply when necessary.

Link: https://github.com/ROCm/rocm-libraries/issues/4398 [1]
Link: https://github.com/ggml-org/llama.cpp/issues/19269 [2]

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2026-02-12 09:38:35 +01:00
RichardScottOZ fa16e517a3
server : fix typo in README.md for features list (#19510)
extra l for full
2026-02-12 08:56:25 +01:00
TriDefender 313493de53
docs : update path in snapdragon README.md (#19533)
paths changed so original example didn't work
2026-02-12 08:13:51 +01:00
Max Krasnyansky b1ff83bbb0
hexagon: further optimization and tuning of matmul and dot kernels (#19407)
* ggml-hexagon: implement 2x2 matmul kernel

* hexmm: implement vec_dot_rx2x2 for Q8_0 and MXFP4

* hexagon: fix editor config failures

* hexagon: refactor matmul ops to use context struct and remove wrappers

Also implement vec_dot_f16 2x2

* hexagon: refactor dyn quantizers to use mmctx

* hexagon: remove mm fastdiv from op_ctx

* hexagon: refactor matmul entry point to reduce code duplication

---------

Co-authored-by: Trivikram Reddy <tamarnat@qti.qualcomm.com>
2026-02-11 23:04:27 -08:00
Adrien Gallouët 4ae1b7517a
common : replace deprecated codecvt using parse_utf8_codepoint (#19517)
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
2026-02-12 07:27:52 +01:00
ddh0 f58de63ec3 remove unused `params` parameter 2026-02-11 22:30:06 -06:00
ddh0 44f9fee248 remove per @compilade 2026-02-11 22:23:10 -06:00
ddh0 40528248fc comment ref #12557 2026-02-11 22:18:56 -06:00
ddh0 b15bb3404c guard ftype imatrix warning 2026-02-11 21:57:55 -06:00
ddh0 1658228d6a add back Q2_K edge case for imatrix 2026-02-11 21:53:07 -06:00
ddh0 1ccd7a49ba simplify for style 2026-02-11 21:41:37 -06:00
ddh0 ae786b862d simplify and rename `tensor_type_requires_imatrix` 2026-02-11 21:21:40 -06:00
ddh0 22db76409b add missing `GGML_TYPE`s 2026-02-11 21:14:19 -06:00
ddh0 55dbee2bbe fixup tensor_requires_imatrix 2026-02-11 21:03:34 -06:00
ddh0 3211a847ef logic error 2026-02-11 20:58:52 -06:00
ddh0 ea8da0503c missing __func__, move imatrix flag set 2026-02-11 20:57:16 -06:00
ddh0 2769f35207 new function `tensor_requires_imatrix`, add courtesy warning about imatrix 2026-02-11 20:49:05 -06:00
ddh0 07f882bbbb add example to --help 2026-02-11 15:36:42 -06:00
ddh0 966b21a981 show model and quant BPW when quant completes 2026-02-11 15:30:12 -06:00
ddh0 150e1db21d fix indent 2026-02-11 14:49:56 -06:00
ddh0 b9b32f0d2d no need to re-calculate ggml_nbytes for tensor 2026-02-11 14:45:44 -06:00
ddh0 c3f42dedd1 use 6 characters for tensor dims (cont.) 2026-02-11 14:29:22 -06:00
ddh0 56c27b13ad add --dry-run to llama-quantize 2026-02-11 14:08:17 -06:00
ddh0 0d22288f00 use 6 characters for tensor dims 2026-02-11 14:08:01 -06:00
ddh0 e6b790f470
Merge branch 'ggml-org:master' into llama-quantize-dry-run 2026-02-11 12:53:22 -06:00
ddh0 844ad3e326 clean slate for branch 2026-02-11 12:47:13 -06:00
lhez 4d3daf80f8
opencl: add general Q6_K mm and Q4_K mv (#19347)
* opencl: add general q6_k mm

* opencl: refine condition for q6_K mm

* opencl: add general q4_K mv

* opencl: fix whitespace
2026-02-11 10:33:13 -08:00
Georgi Gerganov 914dde72ba
ggml : unary ops support non-cont src0 + metal F16 unary ops (#19511)
* ggml : unary ops support non-cont src0

* metal : support F16 unary ops + fix ELU
2026-02-11 18:58:43 +02:00
Daniel Bevenius 3136a849db
common : remove unused token util functions (#19506)
This commit removes two unused functions `common_lcp` and `common_lcs`.
The last usage of these functions was removed in
Commit 33eff40240 ("server : vision support
via libmtmd") and are no longer used anywhere in the codebase.
2026-02-11 17:41:35 +01:00
AesSedai e463bbdf65
model: Add Kimi-K2.5 support (#19170)
* Move dequant_model to after the text_config merge
Add new kimi-k2.5 keys to mtmd convert
Update V_MMPROJ tensor mapping for new mm_projector.proj keys
Update V_M_IMP_NORM for new mm_projector.pre_norm key

* Fix a couple of oversights

* Add image support for Kimi-K2.5

* Revert changes to KimiVLForConditionalGeneration

* Fix an assert crash

* Fix permute swapping w / h on accident

* Kimi-K2.5: Use merged QKV for vision

* Kimi-K2.5: pre-convert vision QK to use build_rope_2d

* Kimi-K2.5: support non-interleaved rope for vision

* Kimi-K2.5: fix min / max pixel

* Kimi-K2.5: remove v/o permutes, unnecessary

* Kimi-K2.5: update permute name to match

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Kimi-K2.5: replace build_rope_2d ggml_cont with ggml_view_3d pointers

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-11 16:47:30 +01:00
Daniel Bevenius 53de59f67d
build : fix case in dSYMs path for build-macos [no ci] (#19515)
This commit updates an incorrect dSYMs where the the 's' was uppercase
by mistake.

The motivation for fixing this is that this can cause issues on case
sensitive operating systems.

Refs: https://github.com/ggml-org/whisper.cpp/pull/3630
2026-02-11 14:02:29 +01:00
Georgi Gerganov 9ab072ebbe
metal : extend l2_norm support for non-cont src0 (#19502) 2026-02-11 14:53:19 +02:00
Johannes Gäßler ada90bf2ba
docs: ban AI for issues and discussions [no CI] (#19512) 2026-02-11 12:49:40 +01:00
Adrien Gallouët 0c1f39a9ae
common : improve download error reporting (#19491)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-11 09:27:55 +01:00
Max Krasnyansky 73cd5e1b97
hexagon: Add ARGSORT, DIV, SQR, SQRT, SUM_ROWS, GEGLU (#19406)
* hexagon: add ARGSORT op

Co-authored-by: Yarden Tal <yardent@qti.qualcomm.com>

* hexagon: argsort reject tensors with huge rows for now

* Adding support for DIV,SQR,SQRT,SUM_ROWS ops in hexagon backend

* hexagon : Add GEGLU op

* hexagon: fix editor config check

* hexagon: rewrite and optimize binary ops ADD/SUB/MUL/DIV/ADD_ID to use DMA

---------

Co-authored-by: Yarden Tal <yardent@qti.qualcomm.com>
Co-authored-by: Manohara Hosakoppa Krishnamurthy <mhosakop@qti.qualcomm.com>
2026-02-10 23:21:12 -08:00
thecaptain789 8ee538ce73
llama : correct typos 'occured' and 'occurences' (#19414)
Co-authored-by: thecaptain789 <thecaptain789@users.noreply.github.com>
2026-02-11 07:05:31 +01:00
Georgi Gerganov 6d95707827
model : fix wavtokenizer embedding notions (#19479) 2026-02-11 07:52:20 +02:00
Georgi Gerganov 89181c0b6d
ggml : extend bin bcast for permuted src1 (#19484)
* tests : extend bin bcast for permuted src1

* cont : extend bin support

* cont : s0 is always 1

* tests : simplify
2026-02-11 07:52:00 +02:00
Georgi Gerganov ceaa89b786
metal : consolidate unary ops (#19490) 2026-02-11 07:51:12 +02:00