Commit Graph

8080 Commits

Author SHA1 Message Date
Piotr Wilkin (ilintar) a4230e169e
Merge e384c6fefe into 0ccbfdef3e 2026-02-14 00:28:58 +00:00
Piotr Wilkin e384c6fefe Add "marker" PEG parser + sample in analyzer 2026-02-14 01:28:53 +01:00
Max Krasnyansky 0ccbfdef3e
hexagon: further optimizations and refactoring for flash attention (#19583)
* ggml-hexagon: fa improvements

ggml-hexagon: optimize flash attention calculations with improved variable handling

ggml-hexagon: streamline flash attention operations by removing redundant checks for FP32

ggml-hexagon: optimize hvx_dot_f16_f16_aa_rx2 by simplifying variable handling for unused elements

ggml-hexagon: optimize flash attention by changing slope vector type to F16

* hexfa: fixed test-backend-ops failurs due to leftover element handling

* hexagon: refactor and optimize fa to use local context struct

* ggml-hexagon: optimize flash-attention using hvx_vec_expf

Use HVX for online softmax.

---------

Co-authored-by: chraac <chraac@gmail.com>
2026-02-13 16:27:30 -08:00
Mengsheng Wu 94a602db66
github : add missing backends to issue templates (#19603) 2026-02-14 00:56:53 +01:00
Piotr Wilkin e501e1dec9 Basic universal PEG parser wrapper with tag-to-dict based extractor 2026-02-14 00:56:22 +01:00
Piotr Wilkin 0884aad1c5 Remove stupid LLM-generated method comment headers (yeah, we can see what the method name is, thank you very much) 2026-02-14 00:37:33 +01:00
Piotr Wilkin 61e18cad3f Create basic content parser if no parser definition found 2026-02-14 00:26:17 +01:00
Piotr Wilkin 3605e78569 Refactor into class-based approach 2026-02-14 00:17:43 +01:00
Jeff Bolz 05a6f0e894
vulkan: restore -inf check in FA shaders (#19582) 2026-02-13 13:35:29 -06:00
Adrien Gallouët b48e80f677
common : update download code (#19573)
* common : remove legacy .json to .etag migration code

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* common : simplify common_download_file_single_online

This commit also force a redownload if the file exists
but has no .etag file.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-13 15:10:46 +01:00
Xuan-Son Nguyen 752584d5f5
model: support GLM MoE DSA arch (NOTE: indexer is not yet supported) (#19460)
* model: support GLM MoE DSA arch

* working version

* pyright

* keep indexer tensors

* add indexer gguf params

* loaded now

* Apply suggestions from code review

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* update

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* minor fix and cleanup

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-13 14:56:53 +01:00
Piotr Wilkin 6415d0f03f Add TODO 2026-02-13 14:42:26 +01:00
Alberto Cabrera Pérez cc2aa81513
Fix wrong memcpy length for block_interleave == 4 (#19575) 2026-02-13 20:32:14 +08:00
ymcki 0e21991472
fix vulkan ggml_acc only works in 3d but not 4d (#19426)
* fix vulkan ggml_acc only works in 3d but not 4d

* removed clamp in test_acc_block

* use the correct stride and its test case

* cuda : fix "supports op" condition

* change src0 to src1 in ggml_vk_acc. Update acc.comp with jeffbolznv\'s suggestion except to keep the boundary check

* version without boundary check

* revert back to boundary check version

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-02-13 13:31:37 +01:00
Sigbjørn Skjæret b2ecc0cdb4
support --verbose-prompt (#19576) 2026-02-13 12:49:10 +01:00
Aman Gupta 5065da554e
CUDA: loop over ne2*ne3 in case it overflows (#19538)
* CUDA: loop over ne2*ne3 in case it overflows

* use fastdiv
2026-02-13 17:01:40 +05:30
Aleksander Grygier 5174d7206f
webui: UI and routing fixes (#19586)
* chore: update webui build output

* chore: update webui build output

* fix: Scroll issues in DropdownMenuSearchable

* webui: fix redirect to root ignoring base path

* fix: Word wrapping

* fix: remove obsolete modality UI tests causing CI failures

- Remove VisionModality/AudioModality test stories
- Remove mockServerProps usage and imports
- Simplify Default test (remove dropdown interaction checks)
- Simplify FileAttachments test (remove mocks)

* feat: Improve formatting performance time

---------

Co-authored-by: Pascal <admin@serveurperso.com>
2026-02-13 12:31:00 +01:00
Oliver Simons 43919b7f4f
CUDA: Do not mutate cgraph for fused ADDs (#19566)
* Do not mutate cgraph for fused ADDs

1. We should try to minimize in-place changes to the incoming
   ggml_cgraph where possible (those should happen in graph_optimize)
2. Modifying in-place leads to an additional, unnecessary graph capture
   step as we store the properties before modifying the graph in-place
   in the cuda-backend

* Assert ggml_tensor is trivially copyable

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Co-authored-by: Aman Gupta <amangupta052@gmail.com>

---------

Co-authored-by: Aman Gupta <amangupta052@gmail.com>
2026-02-13 15:07:55 +05:30
Pavan Shinde 423cf0b26f
docs : fix broken link and typo (#19560) 2026-02-13 09:38:09 +01:00
ymcki 33a56f90a6
model : Kimi Linear fix conv state update (#19531)
* fix conv state update for llama-server parallel serving

---------

Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
2026-02-13 09:10:18 +01:00
Adrien Gallouët 25224c8021
llama : remove deprecated codecvt (#19565)
Using the same conversion function ensures a consistent matching between
the regex pattern and the text.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-13 06:43:53 +01:00
Adrien Gallouët 2f5d8f8edc
vendor : update BoringSSL to 0.20260211.0 (#19562)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-13 06:43:26 +01:00
Georgi Gerganov bb96bfd361
memory : fix kv cache size for hybrid models (#19559) 2026-02-13 07:36:24 +02:00
Georgi Gerganov 0644baefde
metal : improve concurrency (#19555) 2026-02-13 07:35:57 +02:00
Georgi Gerganov 490eb96b88
metal : support GGML_OP_SET (#19548) 2026-02-13 07:34:52 +02:00
Piotr Wilkin 24cc1bcd6d Clean algorithm for calculate_diff_split; fix buggy expectations 2026-02-13 03:17:20 +01:00
Piotr Wilkin e772822011 Whitespace 2026-02-13 00:55:56 +01:00
Piotr Wilkin 28fcef67c0 -> Refactor autoparser analyzer structure
-> Fix content truncation
-> Fix errors in capability detection due to non-empty assistant message
-> Add missing debug prints for Jinja
2026-02-13 00:55:35 +01:00
Shupei Fan 3bb78133ab
hexagon: fix typo in vtcm_needs_release (#19545) 2026-02-12 15:07:49 -08:00
lhez 79cc0f2daf
opencl: add basic support for q4_1 (#19534)
* opencl: add q4_1 mv

* opencl: clean up

* opencl: add flattened q4_1 mv

* opencl: clean up

* opencl: add basic q4_1 mm

* opencl: fix whitespace

* opencl: add general q4_0 mm
2026-02-12 14:52:37 -08:00
Georgi Gerganov 338085c69e
args : add -kvu to llama-parallel (#19577) 2026-02-12 21:52:41 +02:00
Aleksander Grygier 4c61875bf8
webui: Add switcher to Chat Message UI to show raw LLM output (#19571) 2026-02-12 19:55:51 +01:00
Piotr Wilkin 822fd2bee9 Whoops 2026-02-12 17:22:59 +01:00
Adrien Gallouët 4b385bfcf8
vendor : update cpp-httplib (#19537)
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-12 16:11:22 +01:00
Christian Schmitz f488429380
llama : update outdated comment in llama.h (#19428)
* Updated documentation

Model is no longer a parameter

* llama : fix trailing whitespace in comment

---------

Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
2026-02-12 15:52:57 +01:00
Aleksander Grygier 4d688f9ebb
(webui) FEATURE: Enable adding or injecting System Message into chat (#19556)
* feat: Enable adding System Prompt per-chat

* fix: Save draft message in Chat Form when adding System Prompt from new chat view

* fix: Proper system message deletion logic

* chore: Formatting

* chore: update webui build output
2026-02-12 13:56:08 +01:00
Daniel Bevenius ff599039a9
scripts : add support for forks in pr2wt.sh (#19540)
This commit adds support for using the pr2wt.sh (pull request to
workspace) script with forks of upstream llama.cpp.
2026-02-12 13:14:28 +01:00
Aleksander Grygier f486ce9f30
(webui) REFACTOR: UI primitives and polish (#19551)
* webui: UI primitives and polish (non-MCP)

* chore: update webui build output
2026-02-12 12:21:00 +01:00
Aleksander Grygier 38adc7d469
WebUI Architecture Cleanup (#19541)
* webui: architecture foundation (non-MCP core refactors)

* chore: update webui build output
2026-02-12 11:22:27 +01:00
Georgi Gerganov 3b3a948134
metal : update sum_rows kernel to support float4 (#19524) 2026-02-12 11:35:28 +02:00
Mario Limonciello 6845f7f87f
Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (#19461)
There is an upstream problem [1] with AMD's LLVM 22 fork and
rocWMMA 2.2.0 causing compilation issues on devices without
native fp16 support (CDNA devices).

The specialized types aren't resolved properly:
```
/opt/rocm/include/rocwmma/internal/mfma_impl.hpp:2549:37: error: ambiguous partial specializations of 'amdgcn_mfma<__half, __half, __half, 16, 16, 16>'
 2549 |             using ARegsT = typename Impl::ARegsT;
```

Add a workaround to explicitly declare the types and cast when
compiling with HIP and ROCWMMA_FATTN [2].  When this is actually
fixed upstream some guards can be used to detect and wrap the
version that has the fix to only apply when necessary.

Link: https://github.com/ROCm/rocm-libraries/issues/4398 [1]
Link: https://github.com/ggml-org/llama.cpp/issues/19269 [2]

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2026-02-12 09:38:35 +01:00
RichardScottOZ fa16e517a3
server : fix typo in README.md for features list (#19510)
extra l for full
2026-02-12 08:56:25 +01:00
TriDefender 313493de53
docs : update path in snapdragon README.md (#19533)
paths changed so original example didn't work
2026-02-12 08:13:51 +01:00
Max Krasnyansky b1ff83bbb0
hexagon: further optimization and tuning of matmul and dot kernels (#19407)
* ggml-hexagon: implement 2x2 matmul kernel

* hexmm: implement vec_dot_rx2x2 for Q8_0 and MXFP4

* hexagon: fix editor config failures

* hexagon: refactor matmul ops to use context struct and remove wrappers

Also implement vec_dot_f16 2x2

* hexagon: refactor dyn quantizers to use mmctx

* hexagon: remove mm fastdiv from op_ctx

* hexagon: refactor matmul entry point to reduce code duplication

---------

Co-authored-by: Trivikram Reddy <tamarnat@qti.qualcomm.com>
2026-02-11 23:04:27 -08:00
Adrien Gallouët 4ae1b7517a
common : replace deprecated codecvt using parse_utf8_codepoint (#19517)
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
2026-02-12 07:27:52 +01:00
Piotr Wilkin 3096ecaa95 One more crazy spacing out 2026-02-11 23:44:52 +01:00
Piotr Wilkin e40d4cd706 Get rid of some crazy formatting 2026-02-11 22:53:02 +01:00
Piotr Wilkin 56ca124850 Document helpers 2026-02-11 22:42:16 +01:00
lhez 4d3daf80f8
opencl: add general Q6_K mm and Q4_K mv (#19347)
* opencl: add general q6_k mm

* opencl: refine condition for q6_K mm

* opencl: add general q4_K mv

* opencl: fix whitespace
2026-02-11 10:33:13 -08:00
Georgi Gerganov 914dde72ba
ggml : unary ops support non-cont src0 + metal F16 unary ops (#19511)
* ggml : unary ops support non-cont src0

* metal : support F16 unary ops + fix ELU
2026-02-11 18:58:43 +02:00