Commit Graph

475 Commits

Author SHA1 Message Date
Aleksander Grygier 7db3d87434 fix: Retrieve lost setting after resolving merge conflict 2025-11-24 16:07:15 +01:00
Xuan Son Nguyen e514b86d2b fix merge 2025-11-24 14:50:42 +01:00
Xuan Son Nguyen 399b39f21b Merge branch 'master' into xsn/server_model_management_v1_2 2025-11-24 14:45:57 +01:00
Xuan-Son Nguyen b8372eecd9
server: split server.cpp code into server/common/task/queue (#17362)
* add server-task, server-common

* add server-queue

* rm redundant includes

* move enum stop_type to server-task

* server : headers cleanup

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-24 14:41:53 +01:00
Xuan Son Nguyen 539cbf003e add stdin_file 2025-11-24 14:21:21 +01:00
Xuan Son Nguyen 2c6b58f785 nits 2025-11-24 12:20:34 +01:00
Xuan Son Nguyen 6ed192b4dd add --models-allow-extra-args for security 2025-11-24 12:01:16 +01:00
Aleksander Grygier 5ef3f990b9 chore: update webui build output 2025-11-24 02:24:27 +01:00
Aleksander Grygier b2590a7f6c refactor: Cleanup 2025-11-24 02:24:10 +01:00
Aleksander Grygier 13fe8607c5 refactor: Cleanup 2025-11-24 01:42:42 +01:00
Aleksander Grygier 76557cd5d3 Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2 2025-11-24 00:36:00 +01:00
Aleksander Grygier e808f2b2e6 chore: update webui build output 2025-11-23 23:45:08 +01:00
Aleksander Grygier 16747dee5b refactor: UI badges 2025-11-23 23:44:14 +01:00
Aleksander Grygier 188d3236e4 chore: update webui build output 2025-11-23 23:28:49 +01:00
Aleksander Grygier 39fb1c2b17 refactor: Cleanup 2025-11-23 23:28:28 +01:00
Aleksander Grygier fb5445e9ce chore: update webui build output 2025-11-23 23:25:05 +01:00
Aleksander Grygier e92ce07916 refactor: Copy To Clipboard Icon component 2025-11-23 23:23:38 +01:00
Aleksander Grygier 219fd19eb8 chore: update webui build output 2025-11-23 23:09:09 +01:00
Aleksander Grygier 41764b8fa0 refactor: Formatters 2025-11-23 22:54:14 +01:00
Aleksander Grygier f8ff39c64e refactor: Cleanup 2025-11-23 22:32:31 +01:00
Aleksander Grygier d5a6671b81 refactor: Cleanup 2025-11-23 22:27:25 +01:00
Aleksander Grygier 49c8062db1 chore: update webui build output 2025-11-23 22:25:34 +01:00
Aleksander Grygier ef5f9d07b0 feat: Improve Model Selector responsiveness 2025-11-23 22:23:50 +01:00
Aleksander Grygier 1c214e9a49 refactor: Enum imports 2025-11-23 22:16:22 +01:00
Aleksander Grygier 48dbef1729 chore: update webui build output 2025-11-23 21:58:38 +01:00
Aleksander Grygier b7ba13b6a0 refactor: Attachments data 2025-11-23 21:46:43 +01:00
Aleksander Grygier 1f0cb3ab26 feat: Use `model` property for displaying the `repo/model-name` naming format 2025-11-23 21:19:00 +01:00
Xuan Son Nguyen d65be9170b address review comments 2025-11-23 19:31:21 +01:00
Xuan Son Nguyen 5ad594e6d6 cleaner 2025-11-23 19:02:07 +01:00
Pascal 0c7220db56
webui: minor settings reorganization and add disable autoscroll option (#17452)
* webui: added a dedicated 'Display' settings section that groups visualization options

* webui: added a Display setting to toggle automatic chat scrolling

* chore: update webui build output
2025-11-23 18:42:00 +01:00
Xuan Son Nguyen 2e355c7f8e oai-compat /models endpoint 2025-11-23 17:25:24 +01:00
Xuan Son Nguyen f95f9c5128 typo docs 2025-11-23 16:14:02 +01:00
Xuan Son Nguyen 74685f4194 allow reusing args if auto_load 2025-11-23 15:42:33 +01:00
Xuan Son Nguyen f927e21ffc support extra_args on loading model 2025-11-23 15:39:03 +01:00
Xuan Son Nguyen 7ef6312f85 add note 2025-11-23 15:08:31 +01:00
Xuan Son Nguyen f25bfaba4d expose args and exit_code in API 2025-11-23 14:59:04 +01:00
Aleksander Grygier 6282537a8b Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2 2025-11-22 23:35:05 +01:00
Aleksander Grygier 036cc939f8 chore: update webui build output 2025-11-22 19:37:43 +01:00
Aleksander Grygier a39ef24c91 feat: Auto-select model from last assistant response 2025-11-22 19:18:32 +01:00
Aleksander Grygier dc913ec424 feat: Chat Form Actions UI logic improvements 2025-11-22 19:06:17 +01:00
Aleksander Grygier db8ed5df9c feat: Model unavailable UI state for model selector 2025-11-22 19:02:50 +01:00
Aleksander Grygier 076eec6d60 feat: Add copy to clipboard to model name in model info dialog 2025-11-22 19:00:05 +01:00
Xuan Son Nguyen 4af1b6cbac Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_maagement_v1_2
Co-authored-by: Aleksander <aleksander.grygier@gmail.com>
2025-11-22 18:39:31 +01:00
Xuan Son Nguyen d32bbfec82 ad endpoint docs 2025-11-22 18:01:48 +01:00
Aleksander Grygier c274f132cb refactor: Chat Form Submit component 2025-11-22 01:35:02 +01:00
Xuan Son Nguyen 525e2746df address review comments 2025-11-21 23:25:34 +01:00
Xuan Son Nguyen 7241558835 better --models-dir 2025-11-21 23:06:09 +01:00
Xuan Son Nguyen 7cd929076d remove default model path 2025-11-21 22:33:04 +01:00
Xuan Son Nguyen 62ee883d5a implement LRU 2025-11-21 22:22:57 +01:00
Aleksander Grygier 92585c7173 feat: Attachments UX improvements 2025-11-21 21:23:20 +01:00
Aleksander Grygier 69503aa519 feat: Add auto-mic setting 2025-11-21 21:18:13 +01:00
Aleksander Grygier 6b7c0a5090 chore: update webui build output 2025-11-21 14:27:45 +01:00
Aleksander Grygier 8b1d96755e feat: New Model Selection UX WIP 2025-11-21 14:26:50 +01:00
Xuan Son Nguyen 032b9ff4a9 add --models-dir param 2025-11-21 11:11:01 +01:00
Aleksander Grygier c26c3402fe chore: update webui build output 2025-11-21 11:10:07 +01:00
Aleksander Grygier 049f40dfdf refactor: Use only the message data `model` property for displaying model used info 2025-11-21 11:00:49 +01:00
Aleksander Grygier 45bf2a4983 Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2 2025-11-21 09:25:17 +01:00
Aleksander Grygier cc88f6a75b chore: update webui build output 2025-11-21 00:08:09 +01:00
Aleksander Grygier 4bf82a10f1 feat: Improved UX for model information, modality interactions etc 2025-11-21 00:05:43 +01:00
Xuan Son Nguyen a2e912cf35 address review comment 2025-11-20 21:54:22 +01:00
Xuan Son Nguyen cd5c699304 add docs (first version) 2025-11-20 21:45:05 +01:00
Xuan Son Nguyen be25bccdff address review comment 2025-11-20 21:37:22 +01:00
Xuan Son Nguyen 6929c9f43d address thread safety issue 2025-11-20 18:38:02 +01:00
Xuan Son Nguyen 5369aaa1d6 address most problems 2025-11-20 18:34:22 +01:00
Aleksander Grygier c35dee3bd7 Merge remote-tracking branch 'ngxson/xsn/server_model_management_v1_2' into allozaur/server_model_management_v1_2 2025-11-20 16:36:45 +01:00
Aleksander Grygier 8a88576849 refactor: Architecture improvements 2025-11-20 16:34:25 +01:00
Xuan Son Nguyen 5805ca7960 add is_active() 2025-11-20 16:26:31 +01:00
Xuan Son Nguyen d0ea9e0830 also allow terminate loading model 2025-11-20 16:20:14 +01:00
Xuan Son Nguyen 6610724f8e fix unsafe pointer 2025-11-20 16:13:30 +01:00
Xuan Son Nguyen b9ebdf616a more stable 2025-11-20 15:49:40 +01:00
Aleksander Grygier 55d33a8b8c feat: Model/Router server architecture WIP 2025-11-20 14:24:50 +01:00
Xuan Son Nguyen 919d3f8cbf Merge branch 'master' into xsn/server_model_management_v1_2 2025-11-20 14:19:16 +01:00
Aleksander Grygier 4c91f2633f
Improved file naming & structure for UI components (#17405)
* refactor: Component iles naming & structure

* chore: update webui build output

* refactor: Dialog titles + components namig

* chore: update webui build output

* refactor: Imports

* chore: update webui build output
2025-11-20 14:07:31 +01:00
Xuan Son Nguyen 7c6eb17fad fix windows 2025-11-20 13:14:56 +01:00
Georgi Gerganov 196f5083ef
common : more accurate sampling timing (#17382)
* common : more accurate sampling timing

* eval-callback : minor fixes

* cont : add time_meas impl

* cont : fix log msg [no ci]

* cont : fix multiple definitions of time_meas

* llama-cli : exclude chat template init from time measurement

* cont : print percentage of unaccounted time

* cont : do not reset timings
2025-11-20 13:40:10 +02:00
Xuan Son Nguyen 0ef3b61e82 add test 2025-11-20 00:29:59 +01:00
Xuan Son Nguyen 5423d42a35 use subprocess.h, better logging 2025-11-20 00:05:29 +01:00
Xuan Son Nguyen 54b3545791 fix windows build 2025-11-19 22:30:47 +01:00
Xuan Son Nguyen abc0ca478a does this fix windows? 2025-11-19 22:24:00 +01:00
Xuan Son Nguyen 399f536dc7 fix compile error 2025-11-19 21:33:44 +01:00
Xuan Son Nguyen fc5901a449 server: add model management and proxy 2025-11-19 21:23:00 +01:00
Aleksander Grygier 99c53d6558
webui: Add a "Continue" Action for Assistant Message (#16971)
* feat: Add "Continue" action for assistant messages

* feat: Continuation logic & prompt improvements

* chore: update webui build output

* feat: Improve logic for continuing the assistant message

* chore: update webui build output

* chore: Linting

* chore: update webui build output

* fix: Remove synthetic prompt logic, use the prefill feature by sending the conversation payload ending with assistant message

* chore: update webui build output

* feat: Enable "Continue" button based on config & non-reasoning model type

* chore: update webui build output

* chore: Update packages with `npm audit fix`

* fix: Remove redundant error

* chore: update webui build output

* chore: Update `.gitignore`

* fix: Add missing change

* feat: Add auto-resizing for Edit Assistant/User Message textareas

* chore: update webui build output
2025-11-19 14:39:50 +01:00
o7si 97cb3fd5ae
fix: resolve undefined variable 'svr' compilation error (#17348) 2025-11-18 10:10:47 +02:00
Xuan-Son Nguyen 0de8878c96
server: split HTTP into its own interface (#17216)
* server: split HTTP into its own interface

* move server-http and httplib to its own file

* add the remaining endpoints

* fix exception/error handling

* renaming

* missing header

* fix missing windows header

* fix error responses from http layer

* fix slot save/restore handler

* fix case where only one stream chunk is returned

* add NOMINMAX

* do not call sink.write on empty data

* use safe_json_to_str for SSE

* clean up

* add some comments

* improve usage of next()

* bring back the "server is listening on" message

* more generic handler

* add req.headers

* move the chat template print to init()

* add req.path

* cont : minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-17 22:05:44 +01:00
Georgi Gerganov 5b2093becc
server : handle context overflow during decode (#17267)
* server : handle context overflow during decode

* server : minor refactor
2025-11-16 09:23:37 +02:00
Aleksander Grygier 22e1ce2f81
webui: Fix clickability around chat processing statistics UI (#17278)
* fix: Better pointer events handling in chat processing info elements

* chore: update webui build output
2025-11-15 22:41:41 +01:00
Pascal 1411d9275a
webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (#16618)
* webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI

- Purely visual and diagnostic change, no effect on model context, prompt
  construction, or inference behavior

- Captured assistant tool call payloads during streaming and non-streaming
  completions, and persisted them in chat state and storage for downstream use

- Exposed parsed tool call labels beneath the assistant's model info line
  with graceful fallback when parsing fails

- Added tool call badges beneath assistant responses that expose JSON tooltips
  and copy their payloads when clicked, matching the existing model badge styling

- Added a user-facing setting to toggle tool call visibility to the Developer
  settings section directly under the model selector option

* webui: remove scroll listener causing unnecessary layout updates (model selector)

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

* chore: npm run format & update webui build output

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
2025-11-15 21:09:32 +01:00
Ankur Verma c7b7db0445
mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277) 2025-11-15 12:41:16 +01:00
Xuan-Son Nguyen 9b17d74ab7
mtmd: add mtmd_log_set (#17268) 2025-11-14 15:56:19 +01:00
Georgi Gerganov d396b43748
server : fix "can batch with" bug (#17263) 2025-11-14 14:03:45 +02:00
Aleksander Grygier f1bad23f88
Better UX for handling multiple attachments in WebUI (#17246) 2025-11-14 01:19:08 +01:00
Xuan-Son Nguyen c4abcb2457
server: fixing naming conflict res_error (#17243) 2025-11-13 20:53:47 +01:00
Aleksander Grygier 8e878f0cb4
Update packages + upgrade Storybook to v10 (#17201)
* chore: Update packages + upgrade Storybook to v10

* fix: Increase timeout for UI tests
2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen 00c94083b3
server: (refactor) implement generator-based API for task results (#17174)
* server: (refactor) implement generator-based API for task results

* improve

* moving some code

* fix "Response ended prematurely"

* add sink.done before return false

* rm redundant check

* rm unused var

* rename generator --> reader
2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen ee8dd5c658
server: move res_error/res_ok to static function (#17167) 2025-11-12 14:17:24 +01:00
Adrien Gallouët 78010a0d52
cmake : move OpenSSL linking to vendor/cpp-httplib (#17177)
* cmake : move OpenSSL linking to vendor/cpp-httplib

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* bring back httplib 0.27.0

* add -DLLAMA_HTTPLIB

* update cmake config for visionos

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-12 12:32:50 +01:00
Xuan-Son Nguyen 1d45b4228f
vendor: split httplib to cpp/h files (#17150)
* vendor: split httplib to cpp/h files

* move defines

* include httplib if curl is not used

* add TODO

* fix build ios

* fix build visionos instead
2025-11-11 13:32:58 +01:00
Mike Abbott 4a5b8aff40
cmake : add version to all shared object files (#17091)
When compiling llama.cpp in Yocto, it fails QA checks because the generated so files aren't versioned.  This applies a version to all generated so files, allowing the package to build without errors.
2025-11-11 13:19:50 +02:00
Nicolas B. Pierron d2d626938a
Install rpc-server when GGML_RPC is ON. (#17149) 2025-11-11 10:53:59 +00:00
Gabe Goodhart 0c74f32632
memory: Hybrid context shift (#17009)
* feat(memory): Only fail partial erasure of recurrent tail

The recurrent state is always assumed to be the state as of the last update
from the final token in the sequence. When doing a partial erasure, if the
range does not include the final token, the erasure can be considered a
success since any memory used for the sequence prior to the final token
(which is no memory) has been successfully removed.

There is one potential case that this doesn't address which is the pruning
of cache to remove sensitive data from the context. This wouldn't work for
attention cache partial removal (in the middle) either since the KV state
is linearly-dependent and states in later sequence positions would still be
based on the state from the sensitive data, even if that data is no longer
cached, so I don't think this is relevant, but it is worth noting that the
semantics of this change for a partial erasure in the middle of the cache
are essentially "my context is already compressed" and not "all trace of
the removed tokens has been removed."

https://github.com/ggml-org/llama.cpp/issues/16768
Branch: HybridContextShift-16768

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(main): Check the output of seq_rm for prefix matching

This prefix matching is explicitly attempting to remove the tokens at the
end of the sequence that don't match. This is the operation that can't be
performed on a recurrent cache due to the state being updated in place, so
if this removal fails, we need to clear the whole cache.

https://github.com/ggml-org/llama.cpp/issues/16768
Branch: HybridContextShift-16768

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

* fix(memory): Fix condition for partial erasure failure if p0 > pos

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Co-authored-by: compilade <git@compilade.net>

* style: Fix extra parens

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix(main.cpp): Set n_matching_session_tokens to 0 on cache clear

https://github.com/ggml-org/llama.cpp/issues/16768
Branch: HybridContextShift-16768

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>

---------

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
Co-authored-by: compilade <git@compilade.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-10 17:14:23 +02:00