Commit Graph

8052 Commits

Author SHA1 Message Date
ddh0 bddc67547f correct function names 2026-02-13 21:13:53 -06:00
ddh0 97aefac773 update_stats guard 2026-02-12 20:00:23 -06:00
ddh0 053a28980b don't double-count `qs` 2026-02-12 18:31:59 -06:00
ddh0 fd3787ee05 typo 2026-02-12 18:24:47 -06:00
ddh0 d648629f56 remove unused `std::vector<ggml_tensor*> tensors;` 2026-02-12 18:24:16 -06:00
ddh0 6734e77662 don't throw by pointer; unify MiB formatting 2026-02-12 18:22:52 -06:00
ddh0 1f25c130de pretty error msg 2026-02-12 18:11:44 -06:00
ddh0 67e25bbae1 fix compile errors 2026-02-12 18:02:40 -06:00
ddh0 5d6c92440c initial commit for branch 2026-02-12 17:52:59 -06:00
ddh0 0301b1db65
Merge branch 'ggml-org:master' into llama-quantize-dry-run 2026-02-12 14:33:16 -06:00
Georgi Gerganov 338085c69e
args : add -kvu to llama-parallel (#19577) 2026-02-12 21:52:41 +02:00
ddh0 75ab2b3d16
Merge branch 'ggml-org:master' into llama-quantize-dry-run 2026-02-12 12:56:28 -06:00
Aleksander Grygier 4c61875bf8
webui: Add switcher to Chat Message UI to show raw LLM output (#19571) 2026-02-12 19:55:51 +01:00
Adrien Gallouët 4b385bfcf8
vendor : update cpp-httplib (#19537)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-12 16:11:22 +01:00
Christian Schmitz f488429380
llama : update outdated comment in llama.h (#19428)
* Updated documentation

Model is no longer a parameter

* llama : fix trailing whitespace in comment

---------

Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2026-02-12 15:52:57 +01:00
Aleksander Grygier 4d688f9ebb
(webui) FEATURE: Enable adding or injecting System Message into chat (#19556)
* feat: Enable adding System Prompt per-chat

* fix: Save draft message in Chat Form when adding System Prompt from new chat view

* fix: Proper system message deletion logic

* chore: Formatting

* chore: update webui build output
2026-02-12 13:56:08 +01:00
Daniel Bevenius ff599039a9
scripts : add support for forks in pr2wt.sh (#19540)
This commit adds support for using the pr2wt.sh (pull request to
workspace) script with forks of upstream llama.cpp.
2026-02-12 13:14:28 +01:00
Aleksander Grygier f486ce9f30
(webui) REFACTOR: UI primitives and polish (#19551)
* webui: UI primitives and polish (non-MCP)

* chore: update webui build output
2026-02-12 12:21:00 +01:00
Aleksander Grygier 38adc7d469
WebUI Architecture Cleanup (#19541)
* webui: architecture foundation (non-MCP core refactors)

* chore: update webui build output
2026-02-12 11:22:27 +01:00
Georgi Gerganov 3b3a948134
metal : update sum_rows kernel to support float4 (#19524) 2026-02-12 11:35:28 +02:00
Mario Limonciello 6845f7f87f
Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (#19461)
There is an upstream problem [1] with AMD's LLVM 22 fork and
rocWMMA 2.2.0 causing compilation issues on devices without
native fp16 support (CDNA devices).

The specialized types aren't resolved properly:
```
/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
```

Add a workaround to explicitly declare the types and cast when
compiling with HIP and ROCWMMA_FATTN [2].  When this is actually
fixed upstream some guards can be used to detect and wrap the
version that has the fix to only apply when necessary.

Link: https://github.com/ROCm/rocm-libraries/issues/4398 [1]
Link: https://github.com/ggml-org/llama.cpp/issues/19269 [2]

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2026-02-12 09:38:35 +01:00
RichardScottOZ fa16e517a3
server : fix typo in README.md for features list (#19510)
extra l for full
2026-02-12 08:56:25 +01:00
TriDefender 313493de53
docs : update path in snapdragon README.md (#19533)
paths changed so original example didn't work
2026-02-12 08:13:51 +01:00
Max Krasnyansky b1ff83bbb0
hexagon: further optimization and tuning of matmul and dot kernels (#19407)
* ggml-hexagon: implement 2x2 matmul kernel

* hexmm: implement vec_dot_rx2x2 for Q8_0 and MXFP4

* hexagon: fix editor config failures

* hexagon: refactor matmul ops to use context struct and remove wrappers

Also implement vec_dot_f16 2x2

* hexagon: refactor dyn quantizers to use mmctx

* hexagon: remove mm fastdiv from op_ctx

* hexagon: refactor matmul entry point to reduce code duplication

---------

Co-authored-by: Trivikram Reddy <tamarnat@qti.qualcomm.com>
2026-02-11 23:04:27 -08:00
Adrien Gallouët 4ae1b7517a
common : replace deprecated codecvt using parse_utf8_codepoint (#19517)
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
2026-02-12 07:27:52 +01:00
ddh0 f58de63ec3 remove unused `params` parameter 2026-02-11 22:30:06 -06:00
ddh0 44f9fee248 remove per @compilade 2026-02-11 22:23:10 -06:00
ddh0 40528248fc comment ref #12557 2026-02-11 22:18:56 -06:00
ddh0 b15bb3404c guard ftype imatrix warning 2026-02-11 21:57:55 -06:00
ddh0 1658228d6a add back Q2_K edge case for imatrix 2026-02-11 21:53:07 -06:00
ddh0 1ccd7a49ba simplify for style 2026-02-11 21:41:37 -06:00
ddh0 ae786b862d simplify and rename `tensor_type_requires_imatrix` 2026-02-11 21:21:40 -06:00
ddh0 22db76409b add missing `GGML_TYPE`s 2026-02-11 21:14:19 -06:00
ddh0 55dbee2bbe fixup tensor_requires_imatrix 2026-02-11 21:03:34 -06:00
ddh0 3211a847ef logic error 2026-02-11 20:58:52 -06:00
ddh0 ea8da0503c missing __func__, move imatrix flag set 2026-02-11 20:57:16 -06:00
ddh0 2769f35207 new function `tensor_requires_imatrix`, add courtesy warning about imatrix 2026-02-11 20:49:05 -06:00
ddh0 07f882bbbb add example to --help 2026-02-11 15:36:42 -06:00
ddh0 966b21a981 show model and quant BPW when quant completes 2026-02-11 15:30:12 -06:00
ddh0 150e1db21d fix indent 2026-02-11 14:49:56 -06:00
ddh0 b9b32f0d2d no need to re-calculate ggml_nbytes for tensor 2026-02-11 14:45:44 -06:00
ddh0 c3f42dedd1 use 6 characters for tensor dims (cont.) 2026-02-11 14:29:22 -06:00
ddh0 56c27b13ad add --dry-run to llama-quantize 2026-02-11 14:08:17 -06:00
ddh0 0d22288f00 use 6 characters for tensor dims 2026-02-11 14:08:01 -06:00
ddh0 e6b790f470
Merge branch 'ggml-org:master' into llama-quantize-dry-run 2026-02-11 12:53:22 -06:00
ddh0 844ad3e326 clean slate for branch 2026-02-11 12:47:13 -06:00
lhez 4d3daf80f8
opencl: add general Q6_K mm and Q4_K mv (#19347)
* opencl: add general q6_k mm

* opencl: refine condition for q6_K mm

* opencl: add general q4_K mv

* opencl: fix whitespace
2026-02-11 10:33:13 -08:00
Georgi Gerganov 914dde72ba
ggml : unary ops support non-cont src0 + metal F16 unary ops (#19511)
* ggml : unary ops support non-cont src0

* metal : support F16 unary ops + fix ELU
2026-02-11 18:58:43 +02:00
Daniel Bevenius 3136a849db
common : remove unused token util functions (#19506)
This commit removes two unused functions `common_lcp` and `common_lcs`.
The last usage of these functions was removed in
Commit 33eff40240 ("server : vision support
via libmtmd") and are no longer used anywhere in the codebase.
2026-02-11 17:41:35 +01:00
AesSedai e463bbdf65
model: Add Kimi-K2.5 support (#19170)
* Move dequant_model to after the text_config merge
Add new kimi-k2.5 keys to mtmd convert
Update V_MMPROJ tensor mapping for new mm_projector.proj keys
Update V_M_IMP_NORM for new mm_projector.pre_norm key

* Fix a couple of oversights

* Add image support for Kimi-K2.5

* Revert changes to KimiVLForConditionalGeneration

* Fix an assert crash

* Fix permute swapping w / h on accident

* Kimi-K2.5: Use merged QKV for vision

* Kimi-K2.5: pre-convert vision QK to use build_rope_2d

* Kimi-K2.5: support non-interleaved rope for vision

* Kimi-K2.5: fix min / max pixel

* Kimi-K2.5: remove v/o permutes, unnecessary

* Kimi-K2.5: update permute name to match

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Kimi-K2.5: replace build_rope_2d ggml_cont with ggml_view_3d pointers

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-11 16:47:30 +01:00