ddh0
660a3b275f
Merge branch 'ggml-org:master' into power-law-sampler
2026-01-02 17:03:45 -06:00
Anri Lombard
d5574c919c
webui: fix code copy stripping XML/HTML tags ( #18518 )
...
* webui: fix code copy stripping XML/HTML tags
* webui: update static build
2026-01-01 13:44:11 +01:00
ddh0
2d67b1c008
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-30 13:44:42 -06:00
Jeff Bolz
f14f4e421b
server: fix files built redundantly ( #18474 )
2025-12-30 13:11:13 +01:00
ddh0
05d7dc9e9a
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-29 20:14:39 -06:00
Xuan-Son Nguyen
51a48720b8
webui: fix prompt progress ETA calculation ( #18468 )
...
* webui: fix prompt progress ETA calculation
* handle case done === 0
2025-12-29 21:42:11 +01:00
Pascal
c9a3b40d65
Webui/prompt processing progress ( #18300 )
...
* webui: display prompt preprocessing progress
* webui: add percentage/ETA and exclude cached tokens from progress
Address review feedback from ngxson
* webui: add minutes and first chunk (0%) case
* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* webui: address review feedback from allozaur
* chore: update webui build output
* webui: address review feedback from allozaur
* nit
* chore: update webui build output
* feat: Enhance chat processing state
* feat: Improve chat processing statistics UI
* chore: update webui build output
* feat: Add live generation statistics to processing state hook
* feat: Persist prompt processing stats in hook for better UX
* refactor: Enhance ChatMessageStatistics for live stream display
* feat: Implement enhanced live chat statistics into assistant message
* chore: update webui build output
* fix: Proper tab for each stage of prompt processing/generation
* chore: update webui build output
* fix: Improved ETA calculation & display logic
* chore: update webui build output
* feat: Simplify logic & remove ETA from prompt progress
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-12-29 19:32:21 +01:00
wbtek
5b1248c9af
server : Cmdline arg -to changes http read timeout from current 600sec default ( #18279 )
...
* Prevent crash if TTFT >300sec, boosted to 90 days
* server : allow configurable HTTP timeouts for child models
* server : pass needed timeouts from params only
---------
Co-authored-by: Greg Slocum <fromgit@wbtek.slocum.net>
2025-12-29 17:12:48 +01:00
Georgi Gerganov
2a85f720b8
server : handle closed connection for tasks ( #18459 )
2025-12-29 15:34:41 +02:00
ddh0
b95b0884dd
update `power-law` -> `adaptive-p`
2025-12-27 02:10:20 -06:00
ddh0
51070e0db7
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-26 10:33:56 -06:00
o7si
4893cc07bb
server : fix crash when seq_rm fails for hybrid/recurrent models ( #18391 )
...
* server : fix crash when seq_rm fails for hybrid/recurrent models
* server : add allow_processing param to clear_slot
2025-12-26 16:35:29 +01:00
ddh0
ed2890e691
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-25 17:32:29 -06:00
Xuan-Son Nguyen
f5acfb2ffa
server: (router) add stop-timeout option ( #18350 )
...
* server: (router) add stop-timeout option
* also allow stop while loading
* add docs
* unload_lru: also wait for unload to complete
2025-12-24 23:47:49 +01:00
ddh0
295d1d89dd
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-23 16:51:04 -06:00
Xuan-Son Nguyen
5ee4e43f26
server: return_progress to also report 0% processing state ( #18305 )
2025-12-23 21:49:05 +01:00
Pascal
5b6c9bc0f3
webui: apply webui_settings on first load ( #18223 )
...
* webui: apply webui_settings on first load
The webui_settings from /props were not applied on initial load
when default_generation_settings.params was null
Now syncs whenever serverProps is available, regardless of params,
works for both single-model and router modes
* chore: update webui build output
2025-12-23 15:48:03 +01:00
Xuan-Son Nguyen
849d021104
server: fix crash with model not having BOS/EOS ( #18321 )
2025-12-23 14:39:36 +01:00
ddh0
6bad4aef77
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-22 14:45:08 -06:00
Xuan-Son Nguyen
179fd82a72
gen-docs: automatically update markdown file ( #18294 )
...
* gen-docs: automatically update markdown file
* also strip whitespace
* do not add extra newline
* update TOC
2025-12-22 19:30:19 +01:00
Xuan-Son Nguyen
6ce863c803
server: prevent data race from HTTP threads ( #18263 )
...
* server: prevent data race from HTTP threads
* fix params
* fix default_generation_settings
* nits: make handle_completions_impl looks less strange
* stricter const
* fix GGML_ASSERT(idx < states.size())
* move index to be managed by server_response_reader
* http: make sure req & res lifecycle are tied together
* fix compile
* fix index handling buggy
* fix data race for lora endpoint
* nits: fix shadow variable
* nits: revert redundant changes
* nits: correct naming for json_webui_settings
2025-12-22 14:23:34 +01:00
Xuan-Son Nguyen
3997c78e33
server: fix data race in to_json_anthropic ( #18283 )
2025-12-22 13:21:43 +01:00
Xuan-Son Nguyen
86af848153
server: (docs) remove mention about extra_args ( #18262 )
2025-12-22 12:22:01 +01:00
ddh0
89ebdf00c2
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-20 22:36:35 -06:00
Xuan-Son Nguyen
ddcb75dd8a
server: add auto-sleep after N seconds of idle ( #18228 )
...
* implement sleeping at queue level
* implement server-context suspend
* add test
* add docs
* optimization: add fast path
* make sure to free llama_init
* nits
* fix use-after-free
* allow /models to be accessed during sleeping, fix use-after-free
* don't allow accessing /models during sleep, it is not thread-safe
* fix data race on accessing props and model_meta
* small clean up
* trailing whitespace
* rm outdated comments
2025-12-21 02:24:42 +01:00
Oleksandr Kuvshynov
408616adbd
server : [easy] fix per round speculative decode logging ( #18211 )
...
Currently we always log 0, as we clear slot.drafted before.
To reproduce:
Run llama-server with devstral-2 as main model and devstral-2-small as
md, and verbose logging:
```
% ./build/bin/llama-server -v \
-m ~/llms/Devstral-2-123B-Instruct-2512-UD-Q6_K_XL-00001-of-00003.gguf \
-md ~/llms/Devstral-Small-2-24B-Instruct-2512-UD-Q2_K_XL.gguf \
-c 8192 2> /tmp/llama.cpp.debug
Check the log:
slot update_slots: id 3 | task 0 | accepted 11/0 draft tokens, new
n_tokens = 741
slot update_slots: id 3 | task 0 | accepted 4/0 draft tokens, new
n_tokens = 746
slot update_slots: id 3 | task 0 | accepted 16/0 draft tokens, new
n_tokens = 763
slot update_slots: id 3 | task 0 | accepted 11/0 draft tokens, new
n_tokens = 775
slot update_slots: id 3 | task 0 | accepted 2/0 draft tokens, new
n_tokens = 778
slot update_slots: id 3 | task 0 | accepted 4/0 draft tokens, new
n_tokens = 783
slot update_slots: id 3 | task 0 | accepted 8/0 draft tokens, new
n_tokens = 792
slot update_slots: id 3 | task 0 | accepted 2/0 draft tokens, new
n_tokens = 795
slot update_slots: id 3 | task 0 | accepted 1/0 draft tokens, new
n_tokens = 797
slot update_slots: id 3 | task 0 | accepted 1/0 draft tokens, new
n_tokens = 799
slot update_slots: id 3 | task 0 | accepted 0/0 draft tokens, new
n_tokens = 800
slot update_slots: id 3 | task 0 | accepted 2/0 draft tokens, new
n_tokens = 803
slot update_slots: id 3 | task 0 | accepted 1/0 draft tokens, new
n_tokens = 805
slot update_slots: id 3 | task 0 | accepted 6/0 draft tokens, new
n_tokens = 812
slot update_slots: id 3 | task 0 | accepted 3/0 draft tokens, new
n_tokens = 816
```
After the fix, get correct per round logging:
```
slot update_slots: id 3 | task 0 | accepted 7/8 draft tokens, new
n_tokens = 654
slot update_slots: id 3 | task 0 | accepted 1/2 draft tokens, new
n_tokens = 656
slot update_slots: id 3 | task 0 | accepted 2/16 draft tokens, new
n_tokens = 659
slot update_slots: id 3 | task 0 | accepted 1/16 draft tokens, new
n_tokens = 661
slot update_slots: id 3 | task 0 | accepted 2/16 draft tokens, new
n_tokens = 664
slot update_slots: id 3 | task 0 | accepted 16/16 draft tokens, new
n_tokens = 681
slot update_slots: id 3 | task 0 | accepted 16/16 draft tokens, new
n_tokens = 698
slot update_slots: id 3 | task 0 | accepted 3/4 draft tokens, new
n_tokens = 702
slot update_slots: id 3 | task 0 | accepted 5/12 draft tokens, new
n_tokens = 708
slot update_slots: id 3 | task 0 | accepted 16/16 draft tokens, new
n_tokens = 725
slot update_slots: id 3 | task 0 | accepted 1/1 draft tokens, new
n_tokens = 727
slot update_slots: id 3 | task 0 | accepted 8/16 draft tokens, new
n_tokens = 736
```
2025-12-20 10:57:40 +01:00
Xuan-Son Nguyen
9e39a1e6a9
server: support load model on startup, support preset-only options ( #18206 )
...
* server: support autoload model, support preset-only options
* add docs
* load-on-startup
* fix
* Update common/arg.cpp
Co-authored-by: Pascal <admin@serveurperso.com>
---------
Co-authored-by: Pascal <admin@serveurperso.com>
2025-12-20 09:25:27 +01:00
ddh0
f4703d422c
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-19 17:53:19 -06:00
Pascal
14931a826e
arg: fix order to use short form before long form ( #18196 )
...
* arg: fix order to use short form before long form
* arg: update doc
* arg: update test-arg-parser
* arg: address review feedback from ngxson
simplified to check first.length() <= last.length() only
fixed: --sampler-seq, --rerank, --draft ordering
note: middle positions in 3+ arg sets are not verified
* arg: update doc
2025-12-19 18:01:56 +01:00
Aman Gupta
cc0a04343e
server: friendlier error msg when ctx < input ( #18174 )
...
* llama-server: friendlier error msg when ctx < input
This PR adds formatted strings to the server's send_error function
* llama-server: use string_format inline
* fix test
2025-12-19 12:10:00 +01:00
Xuan-Son Nguyen
98c1c7a7bf
presets: refactor, allow cascade presets from different sources, add global section ( #18169 )
...
* presets: refactor, allow cascade presets from different sources
* update docs
* fix neg arg handling
* fix empty mmproj
* also filter out server-controlled args before to_ini()
* skip loading custom_models if not specified
* fix unset_reserved_args
* fix crash on windows
2025-12-19 12:08:20 +01:00
Aleksander Grygier
acb73d8340
webui: Add editing attachments in user messages ( #18147 )
...
* feat: Enable editing attachments in user messages
* feat: Improvements for data handling & UI
* docs: Update Architecture diagrams
* chore: update webui build output
* refactor: Exports
* chore: update webui build output
* feat: Add handling paste for Chat Message Edit Form
* chore: update webui build output
* refactor: Cleanup
* chore: update webui build output
2025-12-19 11:14:07 +01:00
ddh0
dedbe36735
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-18 14:59:16 -06:00
Pascal
f9ec8858ed
webui: display prompt processing stats ( #18146 )
...
* webui: display prompt processing stats
* feat: Improve UI of Chat Message Statistics
* chore: update webui build output
* refactor: Post-review improvements
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-12-18 17:55:03 +01:00
Aleksander Grygier
9ce64aed7d
webui: Fix selecting generated output issues during active streaming ( #18091 )
...
* draft: incremental markdown rendering with stable blocks
* refactor: Logic improvements
* refactor: DRY Markdown post-processing logic
* refactor: ID generation improvements
* fix: Remove runes
* refactor: Clean up & add JSDocs
* chore: update webui static output
* fix: Add tick to prevent race conditions for rendering Markdown blocks
Suggestion from @ServeurpersoCom
Co-authored-by: Pascal <admin@serveurperso.com>
* chore: Run `npm audit fix`
* chore: update webui static output
* feat: Improve performance using global counter & id instead of UUID
* refactor: Enhance Markdown rendering with link and code features
* chore: update webui static output
* fix: Code block content extraction
* chore: update webui static output
* chore: update webui static output
---------
Co-authored-by: Pascal <admin@serveurperso.com>
2025-12-18 11:13:52 +01:00
Kim S.
900316da4e
webui: fix chat screen shadow width ( #18010 )
...
* webui: fix chat screen shadow width
* chore: add index.html.gz
2025-12-18 11:08:42 +01:00
ddh0
60235724cf
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-17 22:07:22 -06:00
Pascal
6ce3d85796
server: (webui) add --webui-config ( #18028 )
...
* server/webui: add server-side WebUI config support
Add CLI arguments --webui-config (inline JSON) and --webui-config-file
(file path) to configure WebUI default settings from server side.
Backend changes:
- Parse JSON once in server_context::load_model() for performance
- Cache parsed config in webui_settings member (zero overhead on /props)
- Add proper error handling in router mode with try/catch
- Expose webui_settings in /props endpoint for both router and child modes
Frontend changes:
- Add 14 configurable WebUI settings via parameter sync
- Add tests for webui settings extraction
- Fix subpath support with base path in API calls
Addresses feedback from @ngxson and @ggerganov
* server: address review feedback from ngxson
* server: regenerate README with llama-gen-docs
2025-12-17 21:45:45 +01:00
Xuan-Son Nguyen
e85e9d7637
server: (router) disable SSL on child process ( #18141 )
2025-12-17 21:39:08 +01:00
Kim S.
d37fc93505
webui: fix chat header width when sidebar is closed ( #17981 )
...
* webui: fix chat header width when sidebar is closed
* chore: add index.html.gz
2025-12-17 20:05:45 +01:00
Xuan-Son Nguyen
bde461de8c
server: (router) allow child process to report status via stdout ( #18110 )
...
* server: (router) allow child process to report status via stdout
* apply suggestions
2025-12-17 14:54:11 +01:00
ddh0
58aa1c6f5a
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-16 13:33:03 -06:00
yifant-code
59977eba7b
server: fix crash when batch > ubatch with embeddings ( #17912 )
...
* server: fix crash when batch > ubatch with embeddings (#12836 )
Fixes #12836 where the server crashes with GGML_ASSERT failure when
running with embeddings enabled and n_batch > n_ubatch.
Root cause: Embeddings use non-causal attention which requires all
tokens to be processed within a single ubatch. When n_batch > n_ubatch,
the server attempts to split processing, causing assertion failure.
Solution:
- Add parameter validation in main() after common_params_parse()
- When embeddings enabled and n_batch > n_ubatch:
* Log warnings explaining the issue
* Automatically set n_batch = n_ubatch
* Prevent server crash
This follows the approach suggested by @ggerganov in issue #12836 .
Note: This supersedes stalled PR #12940 which attempted a runtime fix
in the old examples/server/server.cpp location. This implementation
validates at startup in tools/server/server.cpp (current location).
Testing:
- Build: Compiles successfully
- Validation triggers: Warns when -b > -ub with --embedding
- Auto-correction works: Adjusts n_batch = n_ubatch
- No false positives: Valid params don't trigger warnings
- Verified on macOS M3 Pro with embedding model
* Update tools/server/server.cpp
---------
Co-authored-by: ytian218 <ytian218@bloomberg.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-12-16 14:27:36 +02:00
Xuan-Son Nguyen
7b1db3d3b7
arg: clarify auto kvu/np being set on server ( #17997 )
...
* arg: clarify auto kvu/np being set on server
* improve docs
* use invalid_argument
2025-12-16 12:01:27 +01:00
2114L3
5f5f9b4637
server: Update README.md incorrect argument ( #18073 )
...
n-gpu-layer is incorrect
argument is n-gpu-layers with the 's'
2025-12-16 11:50:43 +01:00
Aleksander Grygier
3034836d36
webui: Improve copy to clipboard with text attachments ( #17969 )
...
* feat: Create copy/paste user message including "pasted text" attachments
* chore: update webui build output
* chore: update webui static output
* fix: UI issues
* chore: update webui static output
* fix: Decode HTML entities using `DOMParser`
* chore: update webui build output
* chore: update webui static output
2025-12-16 07:38:46 +01:00
Aleksander Grygier
a20979d433
webui: Add setting to always show sidebar on Desktop ( #17809 )
...
* feat: Add setting to always show Sidebar on Desktop
* chore: update webui build output
* feat: Add auto-show sidebar setting
* fix: Mobile settings dialog UI
* chore: update webui build output
* feat: UI label update
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
* refactor: Cleanup
* chore: update webui build output
2025-12-16 07:31:37 +01:00
Darius Lukas
40d9c394f4
Webui: Disable attachment button and model selector button when prompt textbox is disabled. ( #17925 )
...
* Pass disabled state to the file attachments button and the model
selector button.
* Update index.html.gz
* Fix model info card in non-router mode.
* Update index.html.gz
2025-12-16 07:15:49 +01:00
ddh0
85b6e52e39
Merge branch 'ggml-org:master' into power-law-sampler
2025-12-15 21:23:25 -06:00
Pascal
0f4f35e7be
Fix unreadable user markdown colors and truncate long texts in deletion dialogs ( #17555 )
...
* webui: limit conversation name length in dialogs
* webui: fix unreadable colors on links and table cell hover in user markdown
* webui: keep table borders visible in user markdown
* webui: updating unified exports
* Update tools/server/webui/src/lib/components/app/chat/ChatAttachments/ChatAttachmentThumbnailFile.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* chore: update webui build output
* chore: update webui build output
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-12-15 16:34:53 +01:00