Georgi Gerganov
080b161995
completion : fix prompt cache for recurrent models ( #19045 )
2026-01-25 09:12:50 +02:00
Molly Sophia
1243f93a2d
readme: update RWKV7 model links ( #19061 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2026-01-25 09:11:19 +02:00
Jakkala Mahesh
24bc238303
llama: fix integer type consistency in split helpers ( #18894 )
...
* llama: fix integer type consistency in split helpers
* llama: apply minor style fixes
* llama: remove trailing whitespace
2026-01-25 09:10:52 +02:00
Daniel Bevenius
16639ba217
common : use two decimal places for float arg help messages ( #19048 )
...
* common : use two decimal places for float arg help messages
This commit updates the help messages for various command-line arguments
in arg.cpp to display floating-point default values with two decimal
places instead of one.
The motivation for this changes is that currently only having one decimal
place means that values generated using --help or llama-gen-docs will not
display the correct values.
For example, currently the value of top-p in tools/server/README.md is
`0.9`, but the default value is actually '0.95'. And running
llama-gen-docs does not update this value as it uses the output from the
help message, which shows only one decimal place, so the values look
like they are unchanged.
* docs : run llama-gen-docs to update docs
2026-01-25 07:31:42 +01:00
Bartowski
9981c30130
convert : fix conversion for inheriting models that were bypassing modify_tensors ( #19064 )
...
* Add undo_permute = False where needed
* Replace super().modify_tensors with ModelBase
* Add one more ModelBase.modify_tensors
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-25 02:36:47 +01:00
Aleksander Grygier
97642211a9
chore: update webui build output
2026-01-25 02:10:25 +01:00
Aleksander Grygier
fc377123b7
refactor: Simplify MCP errors
2026-01-25 02:09:12 +01:00
Aleksander Grygier
202262c2dc
chore: update webui build output
2026-01-25 01:44:14 +01:00
Aleksander Grygier
b58b823b57
refactor: Types
2026-01-25 01:39:49 +01:00
Aleksander Grygier
ba39f8cc7b
chore: update webui build output
2026-01-25 01:21:34 +01:00
Aleksander Grygier
9bcfdc3483
refactor: DRY
2026-01-25 01:17:59 +01:00
Aleksander Grygier
e7ff091881
chore: Add deprecation comment
2026-01-25 01:05:28 +01:00
Aleksander Grygier
1c843b2863
chore: update webui build output
2026-01-25 01:04:34 +01:00
Aleksander Grygier
5dfc520d67
refactor: Cleanup
2026-01-25 00:48:21 +01:00
Aleksander Grygier
6daa39994c
refactor: Naming & Enums
2026-01-25 00:32:37 +01:00
Aleksander Grygier
2562dc50bd
chore: update webui build output
2026-01-25 00:32:16 +01:00
Aleksander Grygier
372202632e
refactor: Cleanup
2026-01-25 00:31:49 +01:00
Aleksander Grygier
ba230c5cce
refactor: Naming + remove redundant component
2026-01-24 23:58:17 +01:00
Aleksander Grygier
f7b5f62586
refactor: Remove unused code
2026-01-24 23:45:06 +01:00
Aleksander Grygier
22d9e645aa
chore: update webui build output
2026-01-24 23:39:04 +01:00
Aleksander Grygier
d938994395
refactor: Cleanup
2026-01-24 23:38:37 +01:00
Johannes Gäßler
e9fd8dcab4
llama-fit-params: keep explicit --ctx-size 0 ( #19070 )
2026-01-24 22:13:08 +01:00
Johannes Gäßler
4e5b83b226
GGUF: check that tensor size is representable ( #19072 )
2026-01-24 21:57:51 +01:00
Aleksander Grygier
fc4c392dce
chore: update webui build output
2026-01-24 20:54:24 +01:00
Aleksander Grygier
79e606eb99
refactor: Constants
2026-01-24 20:52:19 +01:00
Aleksander Grygier
3d7426cdd4
refactor: Cleanup
2026-01-24 20:47:32 +01:00
Aleksander Grygier
8bf2d38da1
chore: update webui build output
2026-01-24 20:32:53 +01:00
Aleksander Grygier
14911e51fc
feat: MCP Prompts implementation improvements
2026-01-24 20:30:52 +01:00
Aleksander Grygier
801ef93522
refactor: Message Height CSS Variable
2026-01-24 19:15:38 +01:00
Aleksander Grygier
13f756421c
refactor: Enums
2026-01-24 18:37:43 +01:00
Pascal
85b8da45f9
fix: resolve TypeScript error in tool response content
2026-01-24 18:04:01 +01:00
Xuan-Son Nguyen
bb02f74c61
chat: fix language input for translategemma ( #19052 )
...
* chat: fix language input for translategemma
* Update common/chat.cpp
Co-authored-by: Aldehir Rojas <hello@alde.dev>
---------
Co-authored-by: Aldehir Rojas <hello@alde.dev>
2026-01-24 17:58:45 +01:00
Pascal
9ddc54b668
webui: enable vision in agentic tool responses
...
- Include images from all message roles (not just user)
- Add multipart content support for tool responses
- Images from MCP tools now accessible in same agentic turn
2026-01-24 17:58:20 +01:00
Aleksander Grygier
172e93d494
Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp
2026-01-24 15:13:58 +01:00
Aleksander Grygier
da9c245838
chore: update webui build output
2026-01-24 13:59:52 +01:00
Aleksander Grygier
7c4bedda87
feat: Improve formatting performance time
2026-01-24 13:58:23 +01:00
Aleksander Grygier
c39c6ef436
fix: System prompt sorting
2026-01-24 13:44:41 +01:00
Aleksander Grygier
2601bf0f59
fix: Save draft message in Chat Form when adding System Prompt from new chat view
2026-01-24 13:32:49 +01:00
Aleksander Grygier
a647edfc0b
fix: Chat Form submission
2026-01-24 12:33:24 +01:00
Johannes Gäßler
8f91ca54ec
CUDA: re-use MLA K data for V in MMA FA ( #19057 )
2026-01-24 10:09:36 +01:00
Aman Gupta
81ab64f3c8
ggml-cuda: enable cuda-graphs for `n-cpu-moe` ( #18934 )
...
* ggml-cuda: add split-wise cuda graph
* add n-cpu-moe compare_llama_bench.py
* fix hip/musa builds
2026-01-24 14:25:20 +08:00
nullname
8af1f5f430
ggml-hexagon: flash-attn opt ( #19025 )
...
* optimize flash attention kernel by improving score computation and online softmax update
* wip
* Refactor online softmax update in flash attention kernel for improved performance
* Optimize flash attention kernel by replacing float array with HVX_Vector for score computation
* wip
2026-01-23 22:02:07 -08:00
Aleksander Grygier
bd16b6145c
chore: update webui build output
2026-01-24 01:32:36 +01:00
Aleksander Grygier
8428741034
feat: MCP Prompts WIP
2026-01-24 01:26:17 +01:00
Georgi Gerganov
557515be1e
graph : utilize `ggml_build_forward_select()` to avoid reallocations ( #18898 )
...
* graph : avoid branches between embedding and token inputs
* models : make deepstack graphs (e.g. Qwen3 VL) have constant topology
* ci : enable -DGGML_SCHED_NO_REALLOC=ON for server CI
* cont : pad token embeddings to n_embd_inp
2026-01-23 18:22:34 +02:00
Aleksander Grygier
3d88d0b6b2
chore: update webui build output
2026-01-23 15:21:56 +01:00
Aleksander Grygier
9c391d8e0d
feat: UI improvements
2026-01-23 15:21:03 +01:00
Neo Zhang
cb6caca191
[SYCL] use malloc to support both iGPU and dGPU in same time ( #18992 )
...
* use malloc to support both iGPU and dGPU in same time
* support windows
---------
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2026-01-23 20:54:10 +08:00
Xuan-Son Nguyen
b5b8fa1c8b
chat : fix translategemma crash on common_chat_format_example ( #19019 )
2026-01-23 12:03:42 +01:00
Daniel Bevenius
a14b960bc7
model-conversion : use BUILD_DIR variable in all scripts ( #19015 )
...
This commit modifies all the utility scripts to use an optional
BUILD_DIR variable/argument to specify the build directory.
The motivation for this is that Commit
3d55846a5c ("model-conversion : add
BUILD_DIR variable to run-converted-model scripts") introduced this
variable to the causal and embeddings scripts, but I missed the scripts
in the utils directory.
2026-01-23 09:01:36 +01:00