Commit Graph

378 Commits

Author SHA1 Message Date
Xuan Son Nguyen d65be9170b address review comments 2025-11-23 19:31:21 +01:00
Xuan Son Nguyen 5ad594e6d6 cleaner 2025-11-23 19:02:07 +01:00
Xuan Son Nguyen 2e355c7f8e oai-compat /models endpoint 2025-11-23 17:25:24 +01:00
Xuan Son Nguyen f95f9c5128 typo docs 2025-11-23 16:14:02 +01:00
Xuan Son Nguyen 74685f4194 allow reusing args if auto_load 2025-11-23 15:42:33 +01:00
Xuan Son Nguyen f927e21ffc support extra_args on loading model 2025-11-23 15:39:03 +01:00
Xuan Son Nguyen 7ef6312f85 add note 2025-11-23 15:08:31 +01:00
Xuan Son Nguyen f25bfaba4d expose args and exit_code in API 2025-11-23 14:59:04 +01:00
Xuan Son Nguyen 4af1b6cbac Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_maagement_v1_2
Co-authored-by: Aleksander <aleksander.grygier@gmail.com>
2025-11-22 18:39:31 +01:00
Xuan Son Nguyen d32bbfec82 ad endpoint docs 2025-11-22 18:01:48 +01:00
Xuan Son Nguyen 525e2746df address review comments 2025-11-21 23:25:34 +01:00
Xuan Son Nguyen 7241558835 better --models-dir 2025-11-21 23:06:09 +01:00
Xuan Son Nguyen 7cd929076d remove default model path 2025-11-21 22:33:04 +01:00
Xuan Son Nguyen 62ee883d5a implement LRU 2025-11-21 22:22:57 +01:00
Xuan Son Nguyen 032b9ff4a9 add --models-dir param 2025-11-21 11:11:01 +01:00
Xuan Son Nguyen a2e912cf35 address review comment 2025-11-20 21:54:22 +01:00
Xuan Son Nguyen cd5c699304 add docs (first version) 2025-11-20 21:45:05 +01:00
Xuan Son Nguyen be25bccdff address review comment 2025-11-20 21:37:22 +01:00
Xuan Son Nguyen 6929c9f43d address thread safety issue 2025-11-20 18:38:02 +01:00
Xuan Son Nguyen 5369aaa1d6 address most problems 2025-11-20 18:34:22 +01:00
Xuan Son Nguyen 5805ca7960 add is_active() 2025-11-20 16:26:31 +01:00
Xuan Son Nguyen d0ea9e0830 also allow terminate loading model 2025-11-20 16:20:14 +01:00
Xuan Son Nguyen 6610724f8e fix unsafe pointer 2025-11-20 16:13:30 +01:00
Xuan Son Nguyen b9ebdf616a more stable 2025-11-20 15:49:40 +01:00
Xuan Son Nguyen 919d3f8cbf Merge branch 'master' into xsn/server_model_management_v1_2 2025-11-20 14:19:16 +01:00
Aleksander Grygier 4c91f2633f
Improved file naming & structure for UI components (#17405)
* refactor: Component iles naming & structure

* chore: update webui build output

* refactor: Dialog titles + components namig

* chore: update webui build output

* refactor: Imports

* chore: update webui build output
2025-11-20 14:07:31 +01:00
Xuan Son Nguyen 7c6eb17fad fix windows 2025-11-20 13:14:56 +01:00
Georgi Gerganov 196f5083ef
common : more accurate sampling timing (#17382)
* common : more accurate sampling timing

* eval-callback : minor fixes

* cont : add time_meas impl

* cont : fix log msg [no ci]

* cont : fix multiple definitions of time_meas

* llama-cli : exclude chat template init from time measurement

* cont : print percentage of unaccounted time

* cont : do not reset timings
2025-11-20 13:40:10 +02:00
Xuan Son Nguyen 0ef3b61e82 add test 2025-11-20 00:29:59 +01:00
Xuan Son Nguyen 5423d42a35 use subprocess.h, better logging 2025-11-20 00:05:29 +01:00
Xuan Son Nguyen 54b3545791 fix windows build 2025-11-19 22:30:47 +01:00
Xuan Son Nguyen abc0ca478a does this fix windows? 2025-11-19 22:24:00 +01:00
Xuan Son Nguyen 399f536dc7 fix compile error 2025-11-19 21:33:44 +01:00
Xuan Son Nguyen fc5901a449 server: add model management and proxy 2025-11-19 21:23:00 +01:00
Aleksander Grygier 99c53d6558
webui: Add a "Continue" Action for Assistant Message (#16971)
* feat: Add "Continue" action for assistant messages

* feat: Continuation logic & prompt improvements

* chore: update webui build output

* feat: Improve logic for continuing the assistant message

* chore: update webui build output

* chore: Linting

* chore: update webui build output

* fix: Remove synthetic prompt logic, use the prefill feature by sending the conversation payload ending with assistant message

* chore: update webui build output

* feat: Enable "Continue" button based on config & non-reasoning model type

* chore: update webui build output

* chore: Update packages with `npm audit fix`

* fix: Remove redundant error

* chore: update webui build output

* chore: Update `.gitignore`

* fix: Add missing change

* feat: Add auto-resizing for Edit Assistant/User Message textareas

* chore: update webui build output
2025-11-19 14:39:50 +01:00
o7si 97cb3fd5ae
fix: resolve undefined variable 'svr' compilation error (#17348) 2025-11-18 10:10:47 +02:00
Xuan-Son Nguyen 0de8878c96
server: split HTTP into its own interface (#17216)
* server: split HTTP into its own interface

* move server-http and httplib to its own file

* add the remaining endpoints

* fix exception/error handling

* renaming

* missing header

* fix missing windows header

* fix error responses from http layer

* fix slot save/restore handler

* fix case where only one stream chunk is returned

* add NOMINMAX

* do not call sink.write on empty data

* use safe_json_to_str for SSE

* clean up

* add some comments

* improve usage of next()

* bring back the "server is listening on" message

* more generic handler

* add req.headers

* move the chat template print to init()

* add req.path

* cont : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-17 22:05:44 +01:00
Georgi Gerganov 5b2093becc
server : handle context overflow during decode (#17267)
* server : handle context overflow during decode

* server : minor refactor
2025-11-16 09:23:37 +02:00
Aleksander Grygier 22e1ce2f81
webui: Fix clickability around chat processing statistics UI (#17278)
* fix: Better pointer events handling in chat processing info elements

* chore: update webui build output
2025-11-15 22:41:41 +01:00
Pascal 1411d9275a
webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (#16618)
* webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI

- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option

* webui: remove scroll listener causing unnecessary layout updates (model selector)

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: npm run format & update webui build output

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-11-15 21:09:32 +01:00
Ankur Verma c7b7db0445
mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277) 2025-11-15 12:41:16 +01:00
Xuan-Son Nguyen 9b17d74ab7
mtmd: add mtmd_log_set (#17268) 2025-11-14 15:56:19 +01:00
Georgi Gerganov d396b43748
server : fix "can batch with" bug (#17263) 2025-11-14 14:03:45 +02:00
Aleksander Grygier f1bad23f88
Better UX for handling multiple attachments in WebUI (#17246) 2025-11-14 01:19:08 +01:00
Xuan-Son Nguyen c4abcb2457
server: fixing naming conflict res_error (#17243) 2025-11-13 20:53:47 +01:00
Aleksander Grygier 8e878f0cb4
Update packages + upgrade Storybook to v10 (#17201)
* chore: Update packages + upgrade Storybook to v10

* fix: Increase timeout for UI tests
2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen 00c94083b3
server: (refactor) implement generator-based API for task results (#17174)
* server: (refactor) implement generator-based API for task results

* improve

* moving some code

* fix "Response ended prematurely"

* add sink.done before return false

* rm redundant check

* rm unused var

* rename generator --> reader
2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen ee8dd5c658
server: move res_error/res_ok to static function (#17167) 2025-11-12 14:17:24 +01:00
Adrien Gallouët 78010a0d52
cmake : move OpenSSL linking to vendor/cpp-httplib (#17177)
* cmake : move OpenSSL linking to vendor/cpp-httplib

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* bring back httplib 0.27.0

* add -DLLAMA_HTTPLIB

* update cmake config for visionos

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-12 12:32:50 +01:00
Xuan-Son Nguyen 1d45b4228f
vendor: split httplib to cpp/h files (#17150)
* vendor: split httplib to cpp/h files

* move defines

* include httplib if curl is not used

* add TODO

* fix build ios

* fix build visionos instead
2025-11-11 13:32:58 +01:00