Aleksander Grygier
82f26ad8e4
refactor: Cleanup
2026-01-26 15:33:27 +01:00
Aleksander Grygier
5bf1c86635
refactor: Cleanup
...
refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
2026-01-26 15:28:50 +01:00
Sigbjørn Skjæret
142cbe2ac6
ci : use new 1vCPU runner for lightweight jobs ( #19107 )
...
* use new 1vCPU runner for lightweight jobs
* pyright is too heavy, look into ty some day
use new pip-install input
2026-01-26 15:22:49 +01:00
Aleksander Grygier
7b127db90c
chore: update webui build output
2026-01-26 15:07:47 +01:00
Aleksander Grygier
717a868c23
feat: Mcp Server Selector
2026-01-26 15:03:05 +01:00
Aleksander Grygier
e566d6641e
fix: Scroll issues in DropdownMenuSearchable
2026-01-26 14:41:15 +01:00
Aleksander Grygier
d675f403e3
chore: update webui build output
2026-01-26 14:33:58 +01:00
Aleksander Grygier
ee0f0b277f
feat: Improve Code blocks rendering + add auto scroll + improve global scroll bar behavior
2026-01-26 14:32:40 +01:00
Aleksander Grygier
6586ae71d2
chore: update webui build output
2026-01-26 12:34:21 +01:00
Aleksander Grygier
c631e26a3f
refactor: Components imports/exports structure & documentation
2026-01-26 12:30:53 +01:00
Georgi Gerganov
56f3ebf38e
model : add correct type for GLM 4.7 Flash ( #19106 )
2026-01-26 11:24:30 +02:00
Aleksander Grygier
b7d1de68c3
refactor: Cleanup
2026-01-26 09:54:44 +01:00
Aleksander Grygier
0a66568fc9
chore: update webui build output
2026-01-26 09:37:27 +01:00
Aleksander Grygier
fa0cad2e6e
refactor: Componentize Chat Form Prompt Picker
2026-01-26 09:36:13 +01:00
Aleksander Grygier
176abf3175
refactor: Utility function
2026-01-26 09:00:41 +01:00
Aleksander Grygier
5ee232d81c
refactor: Use store methods
2026-01-26 08:52:57 +01:00
Johannes Gäßler
0c21677e43
CUDA: faster FA for GQA > 1 but not power of 2 ( #19092 )
2026-01-25 21:19:47 +01:00
ccbinn
0440bfd160
metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/macOS ( #19088 )
...
Co-authored-by: chenbin11 <chenbin11@kuaishou.com>
2026-01-25 20:07:19 +02:00
Sigbjørn Skjæret
0bf5636938
convert : yield Gemma3N custom_map tensors directly ( #19091 )
2026-01-25 18:03:34 +01:00
Aman Gupta
bcb43163ae
ggml-cpu: Use tiled FA for prompt-processing ( #19012 )
...
* ggml-cpu: Use tiled FA for prompt-processing
the FA performance is gimped on CPU on long contexts because it essentially uses a vector kernel. This PR adds a tiled FA for PP. Perf tuning for tile sizes done on a AMD EPYC single-socket 64-c machine.
* fix out of bounds for mask
* skip rows where there are all masks
* skip tile if mask is inf
* store mask in worksize
* check inf tile earlier
2026-01-25 23:25:58 +08:00
Georgi Gerganov
d9c6ce46f7
kv-cache : support V-less cache ( #19067 )
...
* kv-cache : support V-less cache
* cuda : better check for V_is_K_view
* cuda : improve V_is_K_view check
* graph : add comments
* hparams : refactor
2026-01-25 15:48:56 +02:00
Aleksander Grygier
ff0e927be2
chore: update webui build output
2026-01-25 13:38:25 +01:00
Aleksander Grygier
ee9efae203
refactor: Enums
2026-01-25 13:37:08 +01:00
Aleksander Grygier
7f5284d597
refactor: Cleanup
...
refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
2026-01-25 13:13:11 +01:00
Sigbjørn Skjæret
70d860824a
convert : fix Gemma3N, GraniteMoe and Ernie4.5Moe ( #19084 )
...
* fix Gemma3N and Ernie4.5Moe
* fix GraniteMoe
2026-01-25 13:05:05 +01:00
Georgi Gerganov
080b161995
completion : fix prompt cache for recurrent models ( #19045 )
2026-01-25 09:12:50 +02:00
Molly Sophia
1243f93a2d
readme: update RWKV7 model links ( #19061 )
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2026-01-25 09:11:19 +02:00
Jakkala Mahesh
24bc238303
llama: fix integer type consistency in split helpers ( #18894 )
...
* llama: fix integer type consistency in split helpers
* llama: apply minor style fixes
* llama: remove trailing whitespace
2026-01-25 09:10:52 +02:00
Daniel Bevenius
16639ba217
common : use two decimal places for float arg help messages ( #19048 )
...
* common : use two decimal places for float arg help messages
This commit updates the help messages for various command-line arguments
in arg.cpp to display floating-point default values with two decimal
places instead of one.
The motivation for this changes is that currently only having one decimal
place means that values generated using --help or llama-gen-docs will not
display the correct values.
For example, currently the value of top-p in tools/server/README.md is
`0.9`, but the default value is actually '0.95'. And running
llama-gen-docs does not update this value as it uses the output from the
help message, which shows only one decimal place, so the values look
like they are unchanged.
* docs : run llama-gen-docs to update docs
2026-01-25 07:31:42 +01:00
Bartowski
9981c30130
convert : fix conversion for inheriting models that were bypassing modify_tensors ( #19064 )
...
* Add undo_permute = False where needed
* Replace super().modify_tensors with ModelBase
* Add one more ModelBase.modify_tensors
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-25 02:36:47 +01:00
Aleksander Grygier
97642211a9
chore: update webui build output
2026-01-25 02:10:25 +01:00
Aleksander Grygier
fc377123b7
refactor: Simplify MCP errors
2026-01-25 02:09:12 +01:00
Aleksander Grygier
202262c2dc
chore: update webui build output
2026-01-25 01:44:14 +01:00
Aleksander Grygier
b58b823b57
refactor: Types
2026-01-25 01:39:49 +01:00
Aleksander Grygier
ba39f8cc7b
chore: update webui build output
2026-01-25 01:21:34 +01:00
Aleksander Grygier
9bcfdc3483
refactor: DRY
2026-01-25 01:17:59 +01:00
Aleksander Grygier
e7ff091881
chore: Add deprecation comment
2026-01-25 01:05:28 +01:00
Aleksander Grygier
1c843b2863
chore: update webui build output
2026-01-25 01:04:34 +01:00
Aleksander Grygier
5dfc520d67
refactor: Cleanup
2026-01-25 00:48:21 +01:00
Aleksander Grygier
6daa39994c
refactor: Naming & Enums
2026-01-25 00:32:37 +01:00
Aleksander Grygier
2562dc50bd
chore: update webui build output
2026-01-25 00:32:16 +01:00
Aleksander Grygier
372202632e
refactor: Cleanup
2026-01-25 00:31:49 +01:00
Aleksander Grygier
ba230c5cce
refactor: Naming + remove redundant component
2026-01-24 23:58:17 +01:00
Aleksander Grygier
f7b5f62586
refactor: Remove unused code
2026-01-24 23:45:06 +01:00
Aleksander Grygier
22d9e645aa
chore: update webui build output
2026-01-24 23:39:04 +01:00
Aleksander Grygier
d938994395
refactor: Cleanup
2026-01-24 23:38:37 +01:00
Johannes Gäßler
e9fd8dcab4
llama-fit-params: keep explicit --ctx-size 0 ( #19070 )
2026-01-24 22:13:08 +01:00
Johannes Gäßler
4e5b83b226
GGUF: check that tensor size is representable ( #19072 )
2026-01-24 21:57:51 +01:00
Aleksander Grygier
fc4c392dce
chore: update webui build output
2026-01-24 20:54:24 +01:00
Aleksander Grygier
79e606eb99
refactor: Constants
2026-01-24 20:52:19 +01:00