Commit Graph

8112 Commits

Author SHA1 Message Date
Aleksander Grygier 82f26ad8e4 refactor: Cleanup 2026-01-26 15:33:27 +01:00
Aleksander Grygier 5bf1c86635 refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
2026-01-26 15:28:50 +01:00
Sigbjørn Skjæret 142cbe2ac6
ci : use new 1vCPU runner for lightweight jobs (#19107)
* use new 1vCPU runner for lightweight jobs

* pyright is too heavy, look into ty some day

use new pip-install input
2026-01-26 15:22:49 +01:00
Aleksander Grygier 7b127db90c chore: update webui build output 2026-01-26 15:07:47 +01:00
Aleksander Grygier 717a868c23 feat: Mcp Server Selector 2026-01-26 15:03:05 +01:00
Aleksander Grygier e566d6641e fix: Scroll issues in DropdownMenuSearchable 2026-01-26 14:41:15 +01:00
Aleksander Grygier d675f403e3 chore: update webui build output 2026-01-26 14:33:58 +01:00
Aleksander Grygier ee0f0b277f feat: Improve Code blocks rendering + add auto scroll + improve global scroll bar behavior 2026-01-26 14:32:40 +01:00
Aleksander Grygier 6586ae71d2 chore: update webui build output 2026-01-26 12:34:21 +01:00
Aleksander Grygier c631e26a3f refactor: Components imports/exports structure & documentation 2026-01-26 12:30:53 +01:00
Georgi Gerganov 56f3ebf38e
model : add correct type for GLM 4.7 Flash (#19106) 2026-01-26 11:24:30 +02:00
Aleksander Grygier b7d1de68c3 refactor: Cleanup 2026-01-26 09:54:44 +01:00
Aleksander Grygier 0a66568fc9 chore: update webui build output 2026-01-26 09:37:27 +01:00
Aleksander Grygier fa0cad2e6e refactor: Componentize Chat Form Prompt Picker 2026-01-26 09:36:13 +01:00
Aleksander Grygier 176abf3175 refactor: Utility function 2026-01-26 09:00:41 +01:00
Aleksander Grygier 5ee232d81c refactor: Use store methods 2026-01-26 08:52:57 +01:00
Johannes Gäßler 0c21677e43
CUDA: faster FA for GQA > 1 but not power of 2 (#19092) 2026-01-25 21:19:47 +01:00
ccbinn 0440bfd160
metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/macOS (#19088)
Co-authored-by: chenbin11 <chenbin11@kuaishou.com>
2026-01-25 20:07:19 +02:00
Sigbjørn Skjæret 0bf5636938
convert : yield Gemma3N custom_map tensors directly (#19091) 2026-01-25 18:03:34 +01:00
Aman Gupta bcb43163ae
ggml-cpu: Use tiled FA for prompt-processing (#19012)
* ggml-cpu: Use tiled FA for prompt-processing

the FA performance is gimped on CPU on long contexts because it essentially uses a vector kernel. This PR adds a tiled FA for PP. Perf tuning for tile sizes done on a AMD EPYC single-socket 64-c machine.

* fix out of bounds for mask

* skip rows where there are all masks

* skip tile if mask is inf

* store mask in worksize

* check inf tile earlier
2026-01-25 23:25:58 +08:00
Georgi Gerganov d9c6ce46f7
kv-cache : support V-less cache (#19067)
* kv-cache : support V-less cache

* cuda : better check for V_is_K_view

* cuda : improve V_is_K_view check

* graph : add comments

* hparams : refactor
2026-01-25 15:48:56 +02:00
Aleksander Grygier ff0e927be2 chore: update webui build output 2026-01-25 13:38:25 +01:00
Aleksander Grygier ee9efae203 refactor: Enums 2026-01-25 13:37:08 +01:00
Aleksander Grygier 7f5284d597 refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
refactor: Cleanup
2026-01-25 13:13:11 +01:00
Sigbjørn Skjæret 70d860824a
convert : fix Gemma3N, GraniteMoe and Ernie4.5Moe (#19084)
* fix Gemma3N and Ernie4.5Moe

* fix GraniteMoe
2026-01-25 13:05:05 +01:00
Georgi Gerganov 080b161995
completion : fix prompt cache for recurrent models (#19045) 2026-01-25 09:12:50 +02:00
Molly Sophia 1243f93a2d
readme: update RWKV7 model links (#19061)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2026-01-25 09:11:19 +02:00
Jakkala Mahesh 24bc238303
llama: fix integer type consistency in split helpers (#18894)
* llama: fix integer type consistency in split helpers

* llama: apply minor style fixes

* llama: remove trailing whitespace
2026-01-25 09:10:52 +02:00
Daniel Bevenius 16639ba217
common : use two decimal places for float arg help messages (#19048)
* common : use two decimal places for float arg help messages

This commit updates the help messages for various command-line arguments
in arg.cpp to display floating-point default values with two decimal
places instead of one.

The motivation for this changes is that currently only having one decimal
place means that values generated using --help or llama-gen-docs will not
display the correct values.

For example, currently the value of top-p in tools/server/README.md is
`0.9`, but the default value is actually '0.95'. And running
llama-gen-docs does not update this value as it uses the output from the
help message, which shows only one decimal place, so the values look
like they are unchanged.

* docs : run llama-gen-docs to update docs
2026-01-25 07:31:42 +01:00
Bartowski 9981c30130
convert : fix conversion for inheriting models that were bypassing modify_tensors (#19064)
* Add undo_permute = False where needed

* Replace super().modify_tensors with ModelBase

* Add one more ModelBase.modify_tensors

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-25 02:36:47 +01:00
Aleksander Grygier 97642211a9 chore: update webui build output 2026-01-25 02:10:25 +01:00
Aleksander Grygier fc377123b7 refactor: Simplify MCP errors 2026-01-25 02:09:12 +01:00
Aleksander Grygier 202262c2dc chore: update webui build output 2026-01-25 01:44:14 +01:00
Aleksander Grygier b58b823b57 refactor: Types 2026-01-25 01:39:49 +01:00
Aleksander Grygier ba39f8cc7b chore: update webui build output 2026-01-25 01:21:34 +01:00
Aleksander Grygier 9bcfdc3483 refactor: DRY 2026-01-25 01:17:59 +01:00
Aleksander Grygier e7ff091881
chore: Add deprecation comment 2026-01-25 01:05:28 +01:00
Aleksander Grygier 1c843b2863 chore: update webui build output 2026-01-25 01:04:34 +01:00
Aleksander Grygier 5dfc520d67 refactor: Cleanup 2026-01-25 00:48:21 +01:00
Aleksander Grygier 6daa39994c refactor: Naming & Enums 2026-01-25 00:32:37 +01:00
Aleksander Grygier 2562dc50bd chore: update webui build output 2026-01-25 00:32:16 +01:00
Aleksander Grygier 372202632e refactor: Cleanup 2026-01-25 00:31:49 +01:00
Aleksander Grygier ba230c5cce refactor: Naming + remove redundant component 2026-01-24 23:58:17 +01:00
Aleksander Grygier f7b5f62586 refactor: Remove unused code 2026-01-24 23:45:06 +01:00
Aleksander Grygier 22d9e645aa chore: update webui build output 2026-01-24 23:39:04 +01:00
Aleksander Grygier d938994395 refactor: Cleanup 2026-01-24 23:38:37 +01:00
Johannes Gäßler e9fd8dcab4
llama-fit-params: keep explicit --ctx-size 0 (#19070) 2026-01-24 22:13:08 +01:00
Johannes Gäßler 4e5b83b226
GGUF: check that tensor size is representable (#19072) 2026-01-24 21:57:51 +01:00
Aleksander Grygier fc4c392dce chore: update webui build output 2026-01-24 20:54:24 +01:00
Aleksander Grygier 79e606eb99 refactor: Constants 2026-01-24 20:52:19 +01:00