Commit Graph

219 Commits

Author SHA1 Message Date
Daniel Bevenius 7816f0bb56
Merge remote-tracking branch 'upstream/master' into backend-sampling 2025-11-24 07:44:06 +01:00
Pascal 0c7220db56
webui: minor settings reorganization and add disable autoscroll option (#17452)
* webui: added a dedicated 'Display' settings section that groups visualization options

* webui: added a Display setting to toggle automatic chat scrolling

* chore: update webui build output
2025-11-23 18:42:00 +01:00
Daniel Bevenius 79b8cf2a75
Merge remote-tracking branch 'upstream/master' into backend-sampling 2025-11-21 16:38:32 +01:00
Aleksander Grygier 4c91f2633f
Improved file naming & structure for UI components (#17405)
* refactor: Component iles naming & structure

* chore: update webui build output

* refactor: Dialog titles + components namig

* chore: update webui build output

* refactor: Imports

* chore: update webui build output
2025-11-20 14:07:31 +01:00
Daniel Bevenius 0c660e7390
Merge remote-tracking branch 'upstream/master' into backend-sampling 2025-11-20 06:57:24 +01:00
Aleksander Grygier 99c53d6558
webui: Add a "Continue" Action for Assistant Message (#16971)
* feat: Add "Continue" action for assistant messages

* feat: Continuation logic & prompt improvements

* chore: update webui build output

* feat: Improve logic for continuing the assistant message

* chore: update webui build output

* chore: Linting

* chore: update webui build output

* fix: Remove synthetic prompt logic, use the prefill feature by sending the conversation payload ending with assistant message

* chore: update webui build output

* feat: Enable "Continue" button based on config & non-reasoning model type

* chore: update webui build output

* chore: Update packages with `npm audit fix`

* fix: Remove redundant error

* chore: update webui build output

* chore: Update `.gitignore`

* fix: Add missing change

* feat: Add auto-resizing for Edit Assistant/User Message textareas

* chore: update webui build output
2025-11-19 14:39:50 +01:00
o7si 97cb3fd5ae
fix: resolve undefined variable 'svr' compilation error (#17348) 2025-11-18 10:10:47 +02:00
Xuan-Son Nguyen 0de8878c96
server: split HTTP into its own interface (#17216)
* server: split HTTP into its own interface

* move server-http and httplib to its own file

* add the remaining endpoints

* fix exception/error handling

* renaming

* missing header

* fix missing windows header

* fix error responses from http layer

* fix slot save/restore handler

* fix case where only one stream chunk is returned

* add NOMINMAX

* do not call sink.write on empty data

* use safe_json_to_str for SSE

* clean up

* add some comments

* improve usage of next()

* bring back the "server is listening on" message

* more generic handler

* add req.headers

* move the chat template print to init()

* add req.path

* cont : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-17 22:05:44 +01:00
Daniel Bevenius a3eb847d24
webui : add backend sampling options 2025-11-17 16:16:05 +01:00
Daniel Bevenius f1f3e68511
server : add backend sampling options/configuration 2025-11-17 16:16:05 +01:00
Georgi Gerganov 5b2093becc
server : handle context overflow during decode (#17267)
* server : handle context overflow during decode

* server : minor refactor
2025-11-16 09:23:37 +02:00
Aleksander Grygier 22e1ce2f81
webui: Fix clickability around chat processing statistics UI (#17278)
* fix: Better pointer events handling in chat processing info elements

* chore: update webui build output
2025-11-15 22:41:41 +01:00
Pascal 1411d9275a
webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (#16618)
* webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI

- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option

* webui: remove scroll listener causing unnecessary layout updates (model selector)

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: npm run format & update webui build output

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-11-15 21:09:32 +01:00
Xuan-Son Nguyen 9b17d74ab7
mtmd: add mtmd_log_set (#17268) 2025-11-14 15:56:19 +01:00
Georgi Gerganov d396b43748
server : fix "can batch with" bug (#17263) 2025-11-14 14:03:45 +02:00
Aleksander Grygier f1bad23f88
Better UX for handling multiple attachments in WebUI (#17246) 2025-11-14 01:19:08 +01:00
Xuan-Son Nguyen c4abcb2457
server: fixing naming conflict res_error (#17243) 2025-11-13 20:53:47 +01:00
Aleksander Grygier 8e878f0cb4
Update packages + upgrade Storybook to v10 (#17201)
* chore: Update packages + upgrade Storybook to v10

* fix: Increase timeout for UI tests
2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen 00c94083b3
server: (refactor) implement generator-based API for task results (#17174)
* server: (refactor) implement generator-based API for task results

* improve

* moving some code

* fix "Response ended prematurely"

* add sink.done before return false

* rm redundant check

* rm unused var

* rename generator --> reader
2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen ee8dd5c658
server: move res_error/res_ok to static function (#17167) 2025-11-12 14:17:24 +01:00
Adrien Gallouët 78010a0d52
cmake : move OpenSSL linking to vendor/cpp-httplib (#17177)
* cmake : move OpenSSL linking to vendor/cpp-httplib

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* bring back httplib 0.27.0

* add -DLLAMA_HTTPLIB

* update cmake config for visionos

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-12 12:32:50 +01:00
Xuan-Son Nguyen 1d45b4228f
vendor: split httplib to cpp/h files (#17150)
* vendor: split httplib to cpp/h files

* move defines

* include httplib if curl is not used

* add TODO

* fix build ios

* fix build visionos instead
2025-11-11 13:32:58 +01:00
Georgi Gerganov cb1adf8851
server : handle failures to restore host cache (#17078)
* server : handle failures to restore host cache

* server : add tests for the prompt cache
2025-11-09 14:27:05 +02:00
chansikpark 333f2595a3
webui: fix keyboard shortcuts for new chat & edit chat title (#17007) 2025-11-08 20:52:35 +01:00
Aidan eeee367de5
server: fix correct time_ms calculation in prompt_progress (#17093)
* fix: correct time_ms calculation in send_partial_response

The time_ms field was incorrectly calculated. The division was happening
before the subtraction leading to incorrect values.

Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After:
(ggml_time_us() - slot.t_start_process_prompt) / 1000

* docs : document time_ms field in prompt_progress
2025-11-08 15:12:11 +02:00
Georgi Gerganov 16bcc1259d
kv-cache : pad the cache size to 256 for performance (#17046)
* kv-cache : pad the size of the small SWA cache for performance

* context : pad the total context to 256

* cont : future-proof the swa pad

* server : adjust test params to new logic
2025-11-07 20:03:25 +02:00
Georgi Gerganov 8c0d6bb455
server : print the samplers chain for each request (#17070) 2025-11-07 12:24:47 +02:00
Georgi Gerganov b7f9010d24
server : disable checkpoints with mtmd (#17045) 2025-11-06 12:09:29 +02:00
Georgi Gerganov 13b339bcd9
server : do not default to multiple slots with speculative decoding (#17017)
* server : do not default to multiple slots with speculative decoding

* cont : fix
2025-11-05 14:32:55 +02:00
손희준 fd2f84f468
docs: Clarify the endpoint that webui uses (#17001) 2025-11-05 11:20:28 +01:00
Georgi Gerganov 66d8eccd42
server : do context shift only while generating (#17000) 2025-11-04 19:21:36 +02:00
Aleksander Grygier e7da30b584
fix: Viewing multiple PDF attachments (#16974) 2025-11-03 18:53:26 +01:00
Georgi Gerganov 48bd26501b
server : add props.model_alias (#16943)
* server : add props.model_alias

* webui : npm run format
2025-11-03 14:38:23 +01:00
Xuan-Son Nguyen 070ff4d535
mtmd: add --image-min/max-tokens (#16921) 2025-11-03 11:11:18 +01:00
Sascha Rogmann bcfa87622a
feat(webui): improve LaTeX rendering with currency detection (#16508)
* webui : Revised LaTeX formula recognition

* webui : Further examples containg amounts

* webui : vitest for maskInlineLaTeX

* webui: Moved preprocessLaTeX to lib/utils

* webui: LaTeX in table-cells

* chore: update webui build output (use theirs)

* webui: backslash in LaTeX-preprocessing

* chore: update webui build output

* webui: look-behind backslash-check

* chore: update webui build output

* Apply suggestions from code review

Code maintenance (variable names, code formatting, string handling)

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: Moved constants to lib/constants.

* webui: package woff2 inside base64 data

* webui: LaTeX-line-break in display formula

* chore: update webui build output

* webui: Bugfix (font embedding)

* webui: Bugfix (font embedding)

* webui: vite embeds assets

* webui: don't suppress 404 (fonts)

* refactor: KaTeX integration with SCSS

Moves KaTeX styling to SCSS for better customization and font embedding.

This change includes:
- Adding `sass` as a dev dependency.
- Introducing a custom SCSS file to override KaTeX variables and disable TTF/WOFF fonts, relying solely on WOFF2 for embedding.
- Adjusting the Vite configuration to resolve `katex-fonts` alias and inject SCSS variables.

* fix: LaTeX processing within blockquotes

* webui: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-11-03 00:41:08 +01:00
Georgi Gerganov 2f966b8ed8
clip : use FA (#16837)
* clip : use FA

* cont : add warning about unsupported ops

* implement "auto" mode for clip flash attn

* clip : print more detailed op support info during warmup

* cont : remove obsolete comment [no ci]

* improve debugging message

* trailing space

* metal : remove stray return

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-02 21:21:48 +01:00
Georgi Gerganov cd5e3b5754
server : support unified cache across slots (#16736)
* server : support unified context across slots

* cont : fix speculative decoding initialization

* context : fix n_ctx_per_seq computation

* server : purge slots one by one

* tests : add unified cache server tests

* llama : update per-seq context computation

* test-thread-safety : handle tiny training context of the input model

* server : fix server_tokens clear()

* server : use 4 slots + unified KV by default

* llama : add note about context size queries

* cont : update todos [no ci]

* context : do not cap the size of the context

* tests : adjust parameters to be CI friendlier

* context : add warning
2025-11-02 18:14:04 +02:00
Pascal 2f68ce7cfd
webui: auto-refresh /props on inference start to resync model metadata (#16784)
* webui: auto-refresh /props on inference start to resync model metadata

- Add no-cache headers to /props and /slots
- Throttle slot checks to 30s
- Prevent concurrent fetches with promise guard
- Trigger refresh from chat streaming for legacy and ModelSelector
- Show dynamic serverWarning when using cached data

* fix: restore proper legacy behavior in webui by using unified /props refresh

Updated assistant message bubbles to show each message's stored model when available,
falling back to the current server model only when the per-message value is missing

When the model selector is disabled, now fetches /props and prioritizes that model name
over chunk metadata, then persists it with the streamed message so legacy mode properly
reflects the backend configuration

* fix: detect first valid SSE chunk and refresh server props once

* fix: removed the slots availability throttle constant and state

* webui: purge ai-generated cruft

* chore: update webui static build
2025-11-01 19:49:51 +01:00
Pascal e4a71599e5
webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe (#16757)
* webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe dialog

Extended MarkdownContent to flag previewable code languages,
add a preview button alongside copy controls, manage preview
dialog state, and share styling for the new button group

Introduced CodePreviewDialog.svelte, a sandboxed iframe modal
for rendering HTML/JS previews with consistent dialog controls

* webui: fullscreen HTML preview dialog using bits-ui

* Update tools/server/webui/src/lib/components/app/misc/CodePreviewDialog.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/misc/MarkdownContent.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* webui: pedantic style tweak for CodePreviewDialog close button

* webui: remove overengineered preview language logic

* chore: update webui static build

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-11-01 17:14:54 +01:00
Aleksander Grygier d8b860a219
Add a setting to display message generation statistics (#16901)
* feat: Add setting to display message generation statistics

* chore: build static webui output
2025-11-01 15:35:57 +01:00
Jaromír Hradílek 1ae74882f8
webui: recognize AsciiDoc files as valid text files (#16850)
* webui: recognize AsciiDoc files as valid text files

* webui: add an updated static webui build

* webui: add the updated dependency list

* webui: re-add an updated static webui build

This also reverts commit 742dbb8379.
2025-11-01 15:02:57 +01:00
Georgi Gerganov c22473b580
server : don't print user inputs to console (#16871) 2025-10-31 10:54:19 +02:00
Daniel Bevenius 0f715b4e75
server : fix typos in server.cpp comments [no ci] (#16883) 2025-10-31 09:51:26 +01:00
chansikpark 16724b5b68
server : bump request URI max length to 32768 (#16862) 2025-10-30 20:22:23 +02:00
Georgi Gerganov b52edd2558
server : remove n_past (#16818)
* server : remove n_past

* server : replace slot.n_prompt_tokens() with slot.task->n_tokens()

* server : fixes + clean-up

* cont : fix context shift

* server : add server_tokens::pos_next()

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

* server : fix pos_next() usage

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
2025-10-30 18:42:57 +02:00
Georgi Gerganov 85a7d8677b
memory : remove KV cache size padding (#16812)
* memory : remove KV cache size padding

* cont : restore padding for n_kv tensor shape

* server : use slot context size instead of training context size

* server : simplify context limit logic
2025-10-28 20:19:44 +02:00
Aldehir Rojas 280d97be96
grammar : support array references in json schema (#16792)
* grammar : support array references in json schema

* Update json-schema-to-grammar.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* grammar : improve regex when naming ref derived rules

* grammar : replace non-conformant definitions array with anyOf test case

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-28 09:37:52 +01:00
Florian Badie 69e9ff0103
webui: support q URL parameter (#16728)
* webui: support q URL parameter

Fixes #16722
I’ve checked that it works with Firefox’s AI tools

* webui: apply suggestions from code review

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: update webui static build

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-10-24 14:10:29 +02:00
Johannes Gäßler 0bf47a1dbb
server: add memory breakdown print (#16740) 2025-10-23 21:30:17 +02:00
matteo 8cf6b42d46
server : send partial stop string when <EOG> is reached (#15007) 2025-10-23 12:32:24 +03:00