Xuan Son Nguyen
d65be9170b
address review comments
2025-11-23 19:31:21 +01:00
Xuan Son Nguyen
5ad594e6d6
cleaner
2025-11-23 19:02:07 +01:00
Xuan Son Nguyen
2e355c7f8e
oai-compat /models endpoint
2025-11-23 17:25:24 +01:00
Xuan Son Nguyen
f95f9c5128
typo docs
2025-11-23 16:14:02 +01:00
Xuan Son Nguyen
74685f4194
allow reusing args if auto_load
2025-11-23 15:42:33 +01:00
Xuan Son Nguyen
f927e21ffc
support extra_args on loading model
2025-11-23 15:39:03 +01:00
Xuan Son Nguyen
7ef6312f85
add note
2025-11-23 15:08:31 +01:00
Xuan Son Nguyen
f25bfaba4d
expose args and exit_code in API
2025-11-23 14:59:04 +01:00
Xuan Son Nguyen
4af1b6cbac
Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_maagement_v1_2
...
Co-authored-by: Aleksander <aleksander.grygier@gmail.com>
2025-11-22 18:39:31 +01:00
Xuan Son Nguyen
d32bbfec82
ad endpoint docs
2025-11-22 18:01:48 +01:00
Xuan Son Nguyen
525e2746df
address review comments
2025-11-21 23:25:34 +01:00
Xuan Son Nguyen
7241558835
better --models-dir
2025-11-21 23:06:09 +01:00
Xuan Son Nguyen
7cd929076d
remove default model path
2025-11-21 22:33:04 +01:00
Xuan Son Nguyen
62ee883d5a
implement LRU
2025-11-21 22:22:57 +01:00
Xuan Son Nguyen
032b9ff4a9
add --models-dir param
2025-11-21 11:11:01 +01:00
Xuan Son Nguyen
a2e912cf35
address review comment
2025-11-20 21:54:22 +01:00
Xuan Son Nguyen
cd5c699304
add docs (first version)
2025-11-20 21:45:05 +01:00
Xuan Son Nguyen
be25bccdff
address review comment
2025-11-20 21:37:22 +01:00
Xuan Son Nguyen
6929c9f43d
address thread safety issue
2025-11-20 18:38:02 +01:00
Xuan Son Nguyen
5369aaa1d6
address most problems
2025-11-20 18:34:22 +01:00
Xuan Son Nguyen
5805ca7960
add is_active()
2025-11-20 16:26:31 +01:00
Xuan Son Nguyen
d0ea9e0830
also allow terminate loading model
2025-11-20 16:20:14 +01:00
Xuan Son Nguyen
6610724f8e
fix unsafe pointer
2025-11-20 16:13:30 +01:00
Xuan Son Nguyen
b9ebdf616a
more stable
2025-11-20 15:49:40 +01:00
Xuan Son Nguyen
919d3f8cbf
Merge branch 'master' into xsn/server_model_management_v1_2
2025-11-20 14:19:16 +01:00
Aleksander Grygier
4c91f2633f
Improved file naming & structure for UI components ( #17405 )
...
* refactor: Component iles naming & structure
* chore: update webui build output
* refactor: Dialog titles + components namig
* chore: update webui build output
* refactor: Imports
* chore: update webui build output
2025-11-20 14:07:31 +01:00
Xuan Son Nguyen
7c6eb17fad
fix windows
2025-11-20 13:14:56 +01:00
Georgi Gerganov
196f5083ef
common : more accurate sampling timing ( #17382 )
...
* common : more accurate sampling timing
* eval-callback : minor fixes
* cont : add time_meas impl
* cont : fix log msg [no ci]
* cont : fix multiple definitions of time_meas
* llama-cli : exclude chat template init from time measurement
* cont : print percentage of unaccounted time
* cont : do not reset timings
2025-11-20 13:40:10 +02:00
Xuan Son Nguyen
0ef3b61e82
add test
2025-11-20 00:29:59 +01:00
Xuan Son Nguyen
5423d42a35
use subprocess.h, better logging
2025-11-20 00:05:29 +01:00
Xuan Son Nguyen
54b3545791
fix windows build
2025-11-19 22:30:47 +01:00
Xuan Son Nguyen
abc0ca478a
does this fix windows?
2025-11-19 22:24:00 +01:00
Xuan Son Nguyen
399f536dc7
fix compile error
2025-11-19 21:33:44 +01:00
Xuan Son Nguyen
fc5901a449
server: add model management and proxy
2025-11-19 21:23:00 +01:00
Aleksander Grygier
99c53d6558
webui: Add a "Continue" Action for Assistant Message ( #16971 )
...
* feat: Add "Continue" action for assistant messages
* feat: Continuation logic & prompt improvements
* chore: update webui build output
* feat: Improve logic for continuing the assistant message
* chore: update webui build output
* chore: Linting
* chore: update webui build output
* fix: Remove synthetic prompt logic, use the prefill feature by sending the conversation payload ending with assistant message
* chore: update webui build output
* feat: Enable "Continue" button based on config & non-reasoning model type
* chore: update webui build output
* chore: Update packages with `npm audit fix`
* fix: Remove redundant error
* chore: update webui build output
* chore: Update `.gitignore`
* fix: Add missing change
* feat: Add auto-resizing for Edit Assistant/User Message textareas
* chore: update webui build output
2025-11-19 14:39:50 +01:00
o7si
97cb3fd5ae
fix: resolve undefined variable 'svr' compilation error ( #17348 )
2025-11-18 10:10:47 +02:00
Xuan-Son Nguyen
0de8878c96
server: split HTTP into its own interface ( #17216 )
...
* server: split HTTP into its own interface
* move server-http and httplib to its own file
* add the remaining endpoints
* fix exception/error handling
* renaming
* missing header
* fix missing windows header
* fix error responses from http layer
* fix slot save/restore handler
* fix case where only one stream chunk is returned
* add NOMINMAX
* do not call sink.write on empty data
* use safe_json_to_str for SSE
* clean up
* add some comments
* improve usage of next()
* bring back the "server is listening on" message
* more generic handler
* add req.headers
* move the chat template print to init()
* add req.path
* cont : minor
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-17 22:05:44 +01:00
Georgi Gerganov
5b2093becc
server : handle context overflow during decode ( #17267 )
...
* server : handle context overflow during decode
* server : minor refactor
2025-11-16 09:23:37 +02:00
Aleksander Grygier
22e1ce2f81
webui: Fix clickability around chat processing statistics UI ( #17278 )
...
* fix: Better pointer events handling in chat processing info elements
* chore: update webui build output
2025-11-15 22:41:41 +01:00
Pascal
1411d9275a
webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI ( #16618 )
...
* webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI
- Purely visual and diagnostic change, no effect on model context, prompt
construction, or inference behavior
- Captured assistant tool call payloads during streaming and non-streaming
completions, and persisted them in chat state and storage for downstream use
- Exposed parsed tool call labels beneath the assistant's model info line
with graceful fallback when parsing fails
- Added tool call badges beneath assistant responses that expose JSON tooltips
and copy their payloads when clicked, matching the existing model badge styling
- Added a user-facing setting to toggle tool call visibility to the Developer
settings section directly under the model selector option
* webui: remove scroll listener causing unnecessary layout updates (model selector)
* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* chore: npm run format & update webui build output
* chore: update webui build output
---------
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-11-15 21:09:32 +01:00
Ankur Verma
c7b7db0445
mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli ( #17277 )
2025-11-15 12:41:16 +01:00
Xuan-Son Nguyen
9b17d74ab7
mtmd: add mtmd_log_set ( #17268 )
2025-11-14 15:56:19 +01:00
Georgi Gerganov
d396b43748
server : fix "can batch with" bug ( #17263 )
2025-11-14 14:03:45 +02:00
Aleksander Grygier
f1bad23f88
Better UX for handling multiple attachments in WebUI ( #17246 )
2025-11-14 01:19:08 +01:00
Xuan-Son Nguyen
c4abcb2457
server: fixing naming conflict res_error ( #17243 )
2025-11-13 20:53:47 +01:00
Aleksander Grygier
8e878f0cb4
Update packages + upgrade Storybook to v10 ( #17201 )
...
* chore: Update packages + upgrade Storybook to v10
* fix: Increase timeout for UI tests
2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen
00c94083b3
server: (refactor) implement generator-based API for task results ( #17174 )
...
* server: (refactor) implement generator-based API for task results
* improve
* moving some code
* fix "Response ended prematurely"
* add sink.done before return false
* rm redundant check
* rm unused var
* rename generator --> reader
2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen
ee8dd5c658
server: move res_error/res_ok to static function ( #17167 )
2025-11-12 14:17:24 +01:00
Adrien Gallouët
78010a0d52
cmake : move OpenSSL linking to vendor/cpp-httplib ( #17177 )
...
* cmake : move OpenSSL linking to vendor/cpp-httplib
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* bring back httplib 0.27.0
* add -DLLAMA_HTTPLIB
* update cmake config for visionos
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-12 12:32:50 +01:00
Xuan-Son Nguyen
1d45b4228f
vendor: split httplib to cpp/h files ( #17150 )
...
* vendor: split httplib to cpp/h files
* move defines
* include httplib if curl is not used
* add TODO
* fix build ios
* fix build visionos instead
2025-11-11 13:32:58 +01:00