Xuan Son Nguyen
7241558835
better --models-dir
2025-11-21 23:06:09 +01:00
Xuan Son Nguyen
7cd929076d
remove default model path
2025-11-21 22:33:04 +01:00
Xuan Son Nguyen
62ee883d5a
implement LRU
2025-11-21 22:22:57 +01:00
Xuan Son Nguyen
032b9ff4a9
add --models-dir param
2025-11-21 11:11:01 +01:00
Xuan Son Nguyen
a2e912cf35
address review comment
2025-11-20 21:54:22 +01:00
Xuan Son Nguyen
cd5c699304
add docs (first version)
2025-11-20 21:45:05 +01:00
Xuan Son Nguyen
be25bccdff
address review comment
2025-11-20 21:37:22 +01:00
Xuan Son Nguyen
6929c9f43d
address thread safety issue
2025-11-20 18:38:02 +01:00
Xuan Son Nguyen
5369aaa1d6
address most problems
2025-11-20 18:34:22 +01:00
Xuan Son Nguyen
5805ca7960
add is_active()
2025-11-20 16:26:31 +01:00
Xuan Son Nguyen
d0ea9e0830
also allow terminate loading model
2025-11-20 16:20:14 +01:00
Xuan Son Nguyen
6610724f8e
fix unsafe pointer
2025-11-20 16:13:30 +01:00
Xuan Son Nguyen
b9ebdf616a
more stable
2025-11-20 15:49:40 +01:00
Xuan Son Nguyen
919d3f8cbf
Merge branch 'master' into xsn/server_model_management_v1_2
2025-11-20 14:19:16 +01:00
Aleksander Grygier
4c91f2633f
Improved file naming & structure for UI components ( #17405 )
...
* refactor: Component iles naming & structure
* chore: update webui build output
* refactor: Dialog titles + components namig
* chore: update webui build output
* refactor: Imports
* chore: update webui build output
2025-11-20 14:07:31 +01:00
Xuan Son Nguyen
7c6eb17fad
fix windows
2025-11-20 13:14:56 +01:00
Xuan Son Nguyen
0ef3b61e82
add test
2025-11-20 00:29:59 +01:00
Xuan Son Nguyen
5423d42a35
use subprocess.h, better logging
2025-11-20 00:05:29 +01:00
Xuan Son Nguyen
54b3545791
fix windows build
2025-11-19 22:30:47 +01:00
Xuan Son Nguyen
abc0ca478a
does this fix windows?
2025-11-19 22:24:00 +01:00
Xuan Son Nguyen
399f536dc7
fix compile error
2025-11-19 21:33:44 +01:00
Xuan Son Nguyen
fc5901a449
server: add model management and proxy
2025-11-19 21:23:00 +01:00
Aleksander Grygier
99c53d6558
webui: Add a "Continue" Action for Assistant Message ( #16971 )
...
* feat: Add "Continue" action for assistant messages
* feat: Continuation logic & prompt improvements
* chore: update webui build output
* feat: Improve logic for continuing the assistant message
* chore: update webui build output
* chore: Linting
* chore: update webui build output
* fix: Remove synthetic prompt logic, use the prefill feature by sending the conversation payload ending with assistant message
* chore: update webui build output
* feat: Enable "Continue" button based on config & non-reasoning model type
* chore: update webui build output
* chore: Update packages with `npm audit fix`
* fix: Remove redundant error
* chore: update webui build output
* chore: Update `.gitignore`
* fix: Add missing change
* feat: Add auto-resizing for Edit Assistant/User Message textareas
* chore: update webui build output
2025-11-19 14:39:50 +01:00
o7si
97cb3fd5ae
fix: resolve undefined variable 'svr' compilation error ( #17348 )
2025-11-18 10:10:47 +02:00
Xuan-Son Nguyen
0de8878c96
server: split HTTP into its own interface ( #17216 )
...
* server: split HTTP into its own interface
* move server-http and httplib to its own file
* add the remaining endpoints
* fix exception/error handling
* renaming
* missing header
* fix missing windows header
* fix error responses from http layer
* fix slot save/restore handler
* fix case where only one stream chunk is returned
* add NOMINMAX
* do not call sink.write on empty data
* use safe_json_to_str for SSE
* clean up
* add some comments
* improve usage of next()
* bring back the "server is listening on" message
* more generic handler
* add req.headers
* move the chat template print to init()
* add req.path
* cont : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-17 22:05:44 +01:00
Georgi Gerganov
5b2093becc
server : handle context overflow during decode ( #17267 )
...
* server : handle context overflow during decode
* server : minor refactor
2025-11-16 09:23:37 +02:00
Aleksander Grygier
22e1ce2f81
webui: Fix clickability around chat processing statistics UI ( #17278 )
...
* fix: Better pointer events handling in chat processing info elements
* chore: update webui build output
2025-11-15 22:41:41 +01:00
Pascal
1411d9275a
webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI ( #16618 )
...
* webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI
- Purely visual and diagnostic change, no effect on model context, prompt
construction, or inference behavior
- Captured assistant tool call payloads during streaming and non-streaming
completions, and persisted them in chat state and storage for downstream use
- Exposed parsed tool call labels beneath the assistant's model info line
with graceful fallback when parsing fails
- Added tool call badges beneath assistant responses that expose JSON tooltips
and copy their payloads when clicked, matching the existing model badge styling
- Added a user-facing setting to toggle tool call visibility to the Developer
settings section directly under the model selector option
* webui: remove scroll listener causing unnecessary layout updates (model selector)
* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* chore: npm run format & update webui build output
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-11-15 21:09:32 +01:00
Xuan-Son Nguyen
9b17d74ab7
mtmd: add mtmd_log_set ( #17268 )
2025-11-14 15:56:19 +01:00
Georgi Gerganov
d396b43748
server : fix "can batch with" bug ( #17263 )
2025-11-14 14:03:45 +02:00
Aleksander Grygier
f1bad23f88
Better UX for handling multiple attachments in WebUI ( #17246 )
2025-11-14 01:19:08 +01:00
Xuan-Son Nguyen
c4abcb2457
server: fixing naming conflict res_error ( #17243 )
2025-11-13 20:53:47 +01:00
Aleksander Grygier
8e878f0cb4
Update packages + upgrade Storybook to v10 ( #17201 )
...
* chore: Update packages + upgrade Storybook to v10
* fix: Increase timeout for UI tests
2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen
00c94083b3
server: (refactor) implement generator-based API for task results ( #17174 )
...
* server: (refactor) implement generator-based API for task results
* improve
* moving some code
* fix "Response ended prematurely"
* add sink.done before return false
* rm redundant check
* rm unused var
* rename generator --> reader
2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen
ee8dd5c658
server: move res_error/res_ok to static function ( #17167 )
2025-11-12 14:17:24 +01:00
Adrien Gallouët
78010a0d52
cmake : move OpenSSL linking to vendor/cpp-httplib ( #17177 )
...
* cmake : move OpenSSL linking to vendor/cpp-httplib
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* bring back httplib 0.27.0
* add -DLLAMA_HTTPLIB
* update cmake config for visionos
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-12 12:32:50 +01:00
Xuan-Son Nguyen
1d45b4228f
vendor: split httplib to cpp/h files ( #17150 )
...
* vendor: split httplib to cpp/h files
* move defines
* include httplib if curl is not used
* add TODO
* fix build ios
* fix build visionos instead
2025-11-11 13:32:58 +01:00
Georgi Gerganov
cb1adf8851
server : handle failures to restore host cache ( #17078 )
...
* server : handle failures to restore host cache
* server : add tests for the prompt cache
2025-11-09 14:27:05 +02:00
chansikpark
333f2595a3
webui: fix keyboard shortcuts for new chat & edit chat title ( #17007 )
2025-11-08 20:52:35 +01:00
Aidan
eeee367de5
server: fix correct time_ms calculation in prompt_progress ( #17093 )
...
* fix: correct time_ms calculation in send_partial_response
The time_ms field was incorrectly calculated. The division was happening
before the subtraction leading to incorrect values.
Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After:
(ggml_time_us() - slot.t_start_process_prompt) / 1000
* docs : document time_ms field in prompt_progress
2025-11-08 15:12:11 +02:00
Georgi Gerganov
16bcc1259d
kv-cache : pad the cache size to 256 for performance ( #17046 )
...
* kv-cache : pad the size of the small SWA cache for performance
* context : pad the total context to 256
* cont : future-proof the swa pad
* server : adjust test params to new logic
2025-11-07 20:03:25 +02:00
Georgi Gerganov
8c0d6bb455
server : print the samplers chain for each request ( #17070 )
2025-11-07 12:24:47 +02:00
Georgi Gerganov
b7f9010d24
server : disable checkpoints with mtmd ( #17045 )
2025-11-06 12:09:29 +02:00
Georgi Gerganov
13b339bcd9
server : do not default to multiple slots with speculative decoding ( #17017 )
...
* server : do not default to multiple slots with speculative decoding
* cont : fix
2025-11-05 14:32:55 +02:00
손희준
fd2f84f468
docs: Clarify the endpoint that webui uses ( #17001 )
2025-11-05 11:20:28 +01:00
Georgi Gerganov
66d8eccd42
server : do context shift only while generating ( #17000 )
2025-11-04 19:21:36 +02:00
Aleksander Grygier
e7da30b584
fix: Viewing multiple PDF attachments ( #16974 )
2025-11-03 18:53:26 +01:00
Georgi Gerganov
48bd26501b
server : add props.model_alias ( #16943 )
...
* server : add props.model_alias
* webui : npm run format
2025-11-03 14:38:23 +01:00
Xuan-Son Nguyen
070ff4d535
mtmd: add --image-min/max-tokens ( #16921 )
2025-11-03 11:11:18 +01:00
Sascha Rogmann
bcfa87622a
feat(webui): improve LaTeX rendering with currency detection ( #16508 )
...
* webui : Revised LaTeX formula recognition
* webui : Further examples containg amounts
* webui : vitest for maskInlineLaTeX
* webui: Moved preprocessLaTeX to lib/utils
* webui: LaTeX in table-cells
* chore: update webui build output (use theirs)
* webui: backslash in LaTeX-preprocessing
* chore: update webui build output
* webui: look-behind backslash-check
* chore: update webui build output
* Apply suggestions from code review
Code maintenance (variable names, code formatting, string handling)
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* webui: Moved constants to lib/constants.
* webui: package woff2 inside base64 data
* webui: LaTeX-line-break in display formula
* chore: update webui build output
* webui: Bugfix (font embedding)
* webui: Bugfix (font embedding)
* webui: vite embeds assets
* webui: don't suppress 404 (fonts)
* refactor: KaTeX integration with SCSS
Moves KaTeX styling to SCSS for better customization and font embedding.
This change includes:
- Adding `sass` as a dev dependency.
- Introducing a custom SCSS file to override KaTeX variables and disable TTF/WOFF fonts, relying solely on WOFF2 for embedding.
- Adjusting the Vite configuration to resolve `katex-fonts` alias and inject SCSS variables.
* fix: LaTeX processing within blockquotes
* webui: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-11-03 00:41:08 +01:00