Chris Peterson
2aa45ef9e3
llama: Include algorithm header needed for C++23 ( #18078 )
2025-12-16 09:37:55 +02:00
Georgi Gerganov
c560316440
graph : reuse SSM graphs ( #16490 )
...
* graph : reuse hybrid graphs
* graph : reuse recurrent graphs
* graph : fix reuse check for recurrent inputs
* memory : move the recurrent state into the memory context
* Revert "memory : move the recurrent state into the memory context"
This reverts commit 00f115fe810815d4a22a6dee0acc346131e970e1.
* cont : fix build
2025-12-16 09:36:21 +02:00
Sigbjørn Skjæret
d6742125c3
ci : separate webui from server ( #18072 )
...
* separate webui from server
* add public to path
2025-12-16 08:17:26 +01:00
Aleksander Grygier
3034836d36
webui: Improve copy to clipboard with text attachments ( #17969 )
...
* feat: Create copy/paste user message including "pasted text" attachments
* chore: update webui build output
* chore: update webui static output
* fix: UI issues
* chore: update webui static output
* fix: Decode HTML entities using `DOMParser`
* chore: update webui build output
* chore: update webui static output
2025-12-16 07:38:46 +01:00
Aleksander Grygier
a20979d433
webui: Add setting to always show sidebar on Desktop ( #17809 )
...
* feat: Add setting to always show Sidebar on Desktop
* chore: update webui build output
* feat: Add auto-show sidebar setting
* fix: Mobile settings dialog UI
* chore: update webui build output
* feat: UI label update
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
* refactor: Cleanup
* chore: update webui build output
2025-12-16 07:31:37 +01:00
Daniel Bevenius
2995341730
llama : add support for NVIDIA Nemotron 3 Nano ( #18058 )
...
* llama : add support for NVIDIA Nemotron Nano 3
This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling
the conversion and running of this model.
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-16 07:19:26 +01:00
Darius Lukas
40d9c394f4
Webui: Disable attachment button and model selector button when prompt textbox is disabled. ( #17925 )
...
* Pass disabled state to the file attachments button and the model
selector button.
* Update index.html.gz
* Fix model info card in non-router mode.
* Update index.html.gz
2025-12-16 07:15:49 +01:00
ddh0
fcb5129086
remove debug logging, explicitly clamp params at init
2025-12-15 21:42:29 -06:00
ddh0
85b6e52e39
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-15 21:23:25 -06:00
ddh0
1c2d2e900d
simplify target computation
...
last commit with debug logging!
2025-12-15 21:02:11 -06:00
Sigbjørn Skjæret
d6a1e18c65
convert : move rope_parameters to TextModel class ( #18061 )
...
* make sure to search text_config for rope parameters
* move rope_parameters to TextModel class
2025-12-15 22:03:16 +01:00
Shouyu
c45f89d551
ggml-hexagon: mm for mtmd ( #17894 )
...
* feat: add run_mtmd script for hexagon
* fix: fix issue in fp16xfp32 mm
* fix: remove opt_experiment for fp16xfp32 mm
* fix: ggml-hexagon: matmul fp16xfp32 support non-contigious src0
* fix: fix syntax check for run-mtmd.sh for cli
2025-12-15 10:53:56 -08:00
HelloKS
9d52f17ae3
model : add KORMo model ( #18032 )
...
* vocab: add KORMo Tokenizer
* model: add KORMoForCausalLM
* vocab: change pretokenizer to qwen2
* lint: fix unintended line removal
* model: make qwen2 bias tensor optional
* model: use qwen2 architecture for KORMo
2025-12-15 18:51:43 +01:00
ssweens
4529c660c8
kv-cache: Fix state restore fragmented cache ( #17982 )
...
* kv-cache : fix state restore with fragmented cache (#17527 )
Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache.
* tests : update logic
* cleanup: tightened state_read_meta sig, added is_contiguous case
* fix: state_read_meta arg reorder loose ends
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-15 19:28:35 +02:00
ddh0
0344068cf1
remove extraneous logging
2025-12-15 09:35:44 -06:00
Pascal
0f4f35e7be
Fix unreadable user markdown colors and truncate long texts in deletion dialogs ( #17555 )
...
* webui: limit conversation name length in dialogs
* webui: fix unreadable colors on links and table cell hover in user markdown
* webui: keep table borders visible in user markdown
* webui: updating unified exports
* Update tools/server/webui/src/lib/components/app/chat/ChatAttachments/ChatAttachmentThumbnailFile.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-12-15 16:34:53 +01:00
ddh0
9c50b573f5
improve logging messages in llama_sampler_power_law
2025-12-15 09:25:05 -06:00
ddh0
6e66095e1f
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-15 09:07:13 -06:00
Jeremy Demeule
165caaf5fb
metal: use shared buffers on eGPU ( #17866 )
...
* metal: use shared buffers on eGPU
With #15906 , I noticed on important regression when using metal backend on eGPU.
This commit restore the previous behavior and add an option to force its activation.
* metal: use shared buffers on eGPU
* metal: use shared buffers on eGPU
2025-12-15 16:14:49 +02:00
Xuan-Son Nguyen
96a181a933
mtmd: refactor audio preprocessing ( #17978 )
...
* mtmd: refactor audio preprocessing
* refactor
Co-authored-by: Tarek <tdakhran@users.noreply.github.com>
* wip
* wip (2)
* improve constructor
* fix use_natural_log
* fix padding for short input
* clean up
* remove need_chunking
---------
Co-authored-by: Tarek <tdakhran@users.noreply.github.com>
2025-12-15 14:16:52 +01:00
Andrew Aladjev
4a4f7e6550
cli: fixed dead links to tools/main for cli and completion, fixed code owners ( #17993 )
...
Co-authored-by: Andrew Aladjev <andrew.aladjev@gmail.com>
2025-12-15 11:47:04 +01:00
Thomas Jarosch
e73d548659
webui: add "delete all conversations" button to import/export tab ( #17444 )
...
* webui: add "delete all conversations" button to import/export tab
- Add 'Delete all conversations' functionality with confirmation dialog
- Add Trash icon and destructive styling for clear visual indication
- Redirects to "?new_chat=true#/" by using conversationsStore.deleteAll()
* chore: update webui build output
2025-12-15 11:29:29 +01:00
Johannes Gäßler
b1f3a6e5db
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization ( #16653 )
...
* llama: automatically fit args to free memory
llama-fit-params tool
* fix CI
* hints for bug reports, ensure no reallocation
* fix segfault with Vulkan
* add llama-fit-params to CI
* fix CI
* fix CI
* fix CI
* minor adjustments
* fix assignment of 1 dense layer
* fix logger not being reset on model load failure
* remove --n-gpu-layer hint on model load failure
* fix llama-fit-params verbosity
* fix edge case
* fix typo [no ci]
2025-12-15 09:24:59 +01:00
ddh0
4e04bd1ce2
log sampler init values
2025-12-14 23:14:51 -06:00
ddh0
1c58e9a96a
add power law to the new `samplers` vector
2025-12-14 22:32:27 -06:00
ddh0
4e28eb2ffe
format (double)
2025-12-14 22:11:34 -06:00
ddh0
b5ed673ce9
fix logging
2025-12-14 22:08:36 -06:00
ddh0
68543257e9
update default decay to 0.9
2025-12-14 22:03:17 -06:00
ddh0
493bf301ff
silence `missing initializer for member`
2025-12-14 21:55:45 -06:00
ddh0
f5d08724e7
fix bad merge
...
my git skills are lacking
2025-12-14 21:51:59 -06:00
Neo Zhang Jianyu
4aced7a631
[SYCL] Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai ( #17826 )
...
* support gpt-oss GPU by OP add-id, mul_mat for mxfp4, swiglu_oai, fix warning
* fix fault ut case, update ops.md
* rebase, fix format issue
2025-12-15 10:35:15 +08:00
piDack
745fa0e78b
model : add glm-asr support ( #17901 )
...
* [model] add glm-asr support
* fix format for ci
* fix convert format for ci
* update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review
* check root architecture for convert hf script
* fix conficlt with upstream
* fix convert script for glm asr & format clip-impl
* format
* restore hparams text
* improved conversion
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-12-15 03:18:46 +01:00
ddh0
6934780669
optimize
2025-12-14 16:26:15 -06:00
ddh0
36b526d768
Merge branch 'master' into power-law-sampler
2025-12-14 15:43:49 -06:00
Xuan-Son Nguyen
52392291b2
preset: handle negated arg, reverse the meaning if needed ( #18041 )
2025-12-14 22:08:10 +01:00
Sigbjørn Skjæret
5c8a717128
convert : refactor rope scaling handling ( #18013 )
...
* refactor rope scaling handling
* ws--
* missed a couple
* use find_hparam
2025-12-14 16:04:37 +01:00
Haowei Wu
37f5a1093b
mtmd: enhance image resizing in llava_uhd ( #18014 )
2025-12-14 15:57:52 +01:00
Ruben Ortlam
9e6649ecf2
vulkan: fix mul_mat_vec_iq1_s formatting ( #18026 )
2025-12-14 14:52:46 +01:00
Xuan-Son Nguyen
0759b09c90
graph: add f_attn_temp_offset ( #18025 )
2025-12-14 13:05:59 +01:00
ddh0
667b70fdac
update default decay
2025-12-14 03:41:28 -06:00
ddh0
ec54fe5f14
no, but does this?
2025-12-14 02:54:14 -06:00
Georgi Gerganov
254098a279
common : refactor common_sampler + grammar logic changes ( #17937 )
...
* common : refactor common_sampler + grammar logic changes
* tests : increase max_tokens to get needed response
* batched : fix uninitialized samplers
2025-12-14 10:11:13 +02:00
Jeff Bolz
3238b1400c
vulkan: Fix data race/hang in scalar/cm1 flash attention ( #17887 )
2025-12-14 09:00:00 +01:00
ddh0
2a3f579d1f
does this fix it?
2025-12-14 01:55:02 -06:00
lovedheart
4722671641
vulkan: improve mul_mat_vec_iq1_s speed ( #17874 )
2025-12-14 08:47:49 +01:00
Eve
d15d177f43
vulkan: faster q6_k matmul ( #17813 )
...
* q6_k faster mul mat
* 8 values
* fix comment
* switch to two at a time
* start ci for .glsl files
2025-12-14 08:29:37 +01:00
Georgi Gerganov
77ad8542bd
model-conversion : cast logits to float32 ( #18009 )
2025-12-14 08:58:13 +02:00
ddh0
9613c48172
with logging
2025-12-14 00:36:59 -06:00
Georgi Gerganov
609a2d0268
models : fix YaRN regression + consolidate logic ( #18006 )
...
* models : fix YaRN regression + consolidate logic
* cont : fix the fix
* cont : remove header
* cont : add header
2025-12-14 08:34:56 +02:00
Georgi Gerganov
a63cbafbbc
ggml : arm repack fix build
2025-12-14 08:33:51 +02:00