ddh0
7b127e126a
correct function names
2026-02-13 21:17:53 -06:00
ddh0
bddc67547f
correct function names
2026-02-13 21:13:53 -06:00
ddh0
97aefac773
update_stats guard
2026-02-12 20:00:23 -06:00
ddh0
053a28980b
don't double-count `qs`
2026-02-12 18:31:59 -06:00
ddh0
fd3787ee05
typo
2026-02-12 18:24:47 -06:00
ddh0
d648629f56
remove unused `std::vector<ggml_tensor*> tensors;`
2026-02-12 18:24:16 -06:00
ddh0
6734e77662
don't throw by pointer; unify MiB formatting
2026-02-12 18:22:52 -06:00
ddh0
1f25c130de
pretty error msg
2026-02-12 18:11:44 -06:00
ddh0
67e25bbae1
fix compile errors
2026-02-12 18:02:40 -06:00
ddh0
5d6c92440c
initial commit for branch
2026-02-12 17:52:59 -06:00
ddh0
0301b1db65
Merge branch 'ggml-org:master' into llama-quantize-dry-run
2026-02-12 14:33:16 -06:00
Georgi Gerganov
338085c69e
args : add -kvu to llama-parallel ( #19577 )
2026-02-12 21:52:41 +02:00
ddh0
75ab2b3d16
Merge branch 'ggml-org:master' into llama-quantize-dry-run
2026-02-12 12:56:28 -06:00
Aleksander Grygier
4c61875bf8
webui: Add switcher to Chat Message UI to show raw LLM output ( #19571 )
2026-02-12 19:55:51 +01:00
Adrien Gallouët
4b385bfcf8
vendor : update cpp-httplib ( #19537 )
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-12 16:11:22 +01:00
Christian Schmitz
f488429380
llama : update outdated comment in llama.h ( #19428 )
...
* Updated documentation
Model is no longer a parameter
* llama : fix trailing whitespace in comment
---------
Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2026-02-12 15:52:57 +01:00
Aleksander Grygier
4d688f9ebb
(webui) FEATURE: Enable adding or injecting System Message into chat ( #19556 )
...
* feat: Enable adding System Prompt per-chat
* fix: Save draft message in Chat Form when adding System Prompt from new chat view
* fix: Proper system message deletion logic
* chore: Formatting
* chore: update webui build output
2026-02-12 13:56:08 +01:00
Daniel Bevenius
ff599039a9
scripts : add support for forks in pr2wt.sh ( #19540 )
...
This commit adds support for using the pr2wt.sh (pull request to
workspace) script with forks of upstream llama.cpp.
2026-02-12 13:14:28 +01:00
Aleksander Grygier
f486ce9f30
(webui) REFACTOR: UI primitives and polish ( #19551 )
...
* webui: UI primitives and polish (non-MCP)
* chore: update webui build output
2026-02-12 12:21:00 +01:00
Aleksander Grygier
38adc7d469
WebUI Architecture Cleanup ( #19541 )
...
* webui: architecture foundation (non-MCP core refactors)
* chore: update webui build output
2026-02-12 11:22:27 +01:00
Georgi Gerganov
3b3a948134
metal : update sum_rows kernel to support float4 ( #19524 )
2026-02-12 11:35:28 +02:00
Mario Limonciello
6845f7f87f
Add a workaround for compilation with ROCWMMA_FATTN and gfx9 ( #19461 )
...
There is an upstream problem [1] with AMD's LLVM 22 fork and
rocWMMA 2.2.0 causing compilation issues on devices without
native fp16 support (CDNA devices).
The specialized types aren't resolved properly:
```
/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
2549 | using ARegsT = typename Impl::ARegsT;
```
Add a workaround to explicitly declare the types and cast when
compiling with HIP and ROCWMMA_FATTN [2]. When this is actually
fixed upstream some guards can be used to detect and wrap the
version that has the fix to only apply when necessary.
Link: https://github.com/ROCm/rocm-libraries/issues/4398 [1]
Link: https://github.com/ggml-org/llama.cpp/issues/19269 [2]
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2026-02-12 09:38:35 +01:00
RichardScottOZ
fa16e517a3
server : fix typo in README.md for features list ( #19510 )
...
extra l for full
2026-02-12 08:56:25 +01:00
TriDefender
313493de53
docs : update path in snapdragon README.md ( #19533 )
...
paths changed so original example didn't work
2026-02-12 08:13:51 +01:00
Max Krasnyansky
b1ff83bbb0
hexagon: further optimization and tuning of matmul and dot kernels ( #19407 )
...
* ggml-hexagon: implement 2x2 matmul kernel
* hexmm: implement vec_dot_rx2x2 for Q8_0 and MXFP4
* hexagon: fix editor config failures
* hexagon: refactor matmul ops to use context struct and remove wrappers
Also implement vec_dot_f16 2x2
* hexagon: refactor dyn quantizers to use mmctx
* hexagon: remove mm fastdiv from op_ctx
* hexagon: refactor matmul entry point to reduce code duplication
---------
Co-authored-by: Trivikram Reddy <tamarnat@qti.qualcomm.com>
2026-02-11 23:04:27 -08:00
Adrien Gallouët
4ae1b7517a
common : replace deprecated codecvt using parse_utf8_codepoint ( #19517 )
...
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
2026-02-12 07:27:52 +01:00
ddh0
f58de63ec3
remove unused `params` parameter
2026-02-11 22:30:06 -06:00
ddh0
44f9fee248
remove per @compilade
2026-02-11 22:23:10 -06:00
ddh0
40528248fc
comment ref #12557
2026-02-11 22:18:56 -06:00
ddh0
b15bb3404c
guard ftype imatrix warning
2026-02-11 21:57:55 -06:00
ddh0
1658228d6a
add back Q2_K edge case for imatrix
2026-02-11 21:53:07 -06:00
ddh0
1ccd7a49ba
simplify for style
2026-02-11 21:41:37 -06:00
ddh0
ae786b862d
simplify and rename `tensor_type_requires_imatrix`
2026-02-11 21:21:40 -06:00
ddh0
22db76409b
add missing `GGML_TYPE`s
2026-02-11 21:14:19 -06:00
ddh0
55dbee2bbe
fixup tensor_requires_imatrix
2026-02-11 21:03:34 -06:00
ddh0
3211a847ef
logic error
2026-02-11 20:58:52 -06:00
ddh0
ea8da0503c
missing __func__, move imatrix flag set
2026-02-11 20:57:16 -06:00
ddh0
2769f35207
new function `tensor_requires_imatrix`, add courtesy warning about imatrix
2026-02-11 20:49:05 -06:00
ddh0
07f882bbbb
add example to --help
2026-02-11 15:36:42 -06:00
ddh0
966b21a981
show model and quant BPW when quant completes
2026-02-11 15:30:12 -06:00
ddh0
150e1db21d
fix indent
2026-02-11 14:49:56 -06:00
ddh0
b9b32f0d2d
no need to re-calculate ggml_nbytes for tensor
2026-02-11 14:45:44 -06:00
ddh0
c3f42dedd1
use 6 characters for tensor dims (cont.)
2026-02-11 14:29:22 -06:00
ddh0
56c27b13ad
add --dry-run to llama-quantize
2026-02-11 14:08:17 -06:00
ddh0
0d22288f00
use 6 characters for tensor dims
2026-02-11 14:08:01 -06:00
ddh0
e6b790f470
Merge branch 'ggml-org:master' into llama-quantize-dry-run
2026-02-11 12:53:22 -06:00
ddh0
844ad3e326
clean slate for branch
2026-02-11 12:47:13 -06:00
lhez
4d3daf80f8
opencl: add general Q6_K mm and Q4_K mv ( #19347 )
...
* opencl: add general q6_k mm
* opencl: refine condition for q6_K mm
* opencl: add general q4_K mv
* opencl: fix whitespace
2026-02-11 10:33:13 -08:00
Georgi Gerganov
914dde72ba
ggml : unary ops support non-cont src0 + metal F16 unary ops ( #19511 )
...
* ggml : unary ops support non-cont src0
* metal : support F16 unary ops + fix ELU
2026-02-11 18:58:43 +02:00
Daniel Bevenius
3136a849db
common : remove unused token util functions ( #19506 )
...
This commit removes two unused functions `common_lcp` and `common_lcs`.
The last usage of these functions was removed in
Commit 33eff40240 ("server : vision support
via libmtmd") and are no longer used anywhere in the codebase.
2026-02-11 17:41:35 +01:00