server: (doc) clarify in-scope and out-scope features (#20794)
* server: (doc) clarify in-scope and out-scope features * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
parent
e06c3ab2bc
commit
fb78ad29bb
|
|
@ -178,6 +178,8 @@ Maintainers reserve the right to decline review or close pull requests for any r
|
|||
- New code should follow the guidelines (coding, naming, etc.) outlined in this document. Exceptions are allowed in isolated, backend-specific parts of the code that do not interface directly with the `ggml` interfaces.
|
||||
_(NOTE: for legacy reasons, existing code is not required to follow this guideline)_
|
||||
|
||||
- For changes in server, please make sure to refer to the [server development documentation](./tools/server/README-dev.md)
|
||||
|
||||
# Documentation
|
||||
|
||||
- Documentation is a community effort
|
||||
|
|
|
|||
|
|
@ -4,6 +4,36 @@ This document provides an in-depth technical overview of `llama-server`, intende
|
|||
|
||||
If you are an end user consuming `llama-server` as a product, please refer to the main [README](./README.md) instead.
|
||||
|
||||
## Scope of features
|
||||
|
||||
In-scope types of feature:
|
||||
|
||||
- Backend:
|
||||
- Basic inference features: text completion, embeddings output
|
||||
- Chat-oriented features: chat completion, tool calling
|
||||
- Third-party API compatibility, e.g. OAI-compat, Anthropic-compat
|
||||
- Multimodal input/output
|
||||
- Memory management: save/load state, context checkpoints
|
||||
- Model management
|
||||
- Features that are required by the Web UI
|
||||
- Frontend:
|
||||
- Chat-oriented features, example: basic chat, image upload, edit messages
|
||||
- Agentic features, example: MCP
|
||||
- Model management
|
||||
|
||||
Note: For security reasons, features that require reading or writing external files must be **disabled by default**. This covers features like: MCP, model save/load
|
||||
|
||||
Out-of-scope features:
|
||||
|
||||
- Backend:
|
||||
- Features that require a loop of external API calls, e.g. server-side agentic loop. This is because external API calls in C++ are costly to maintain. Any complex third-party logic should be implemented outside of server code.
|
||||
- Features that expose the internal state of the model to the API, example: getting the intermediate activation from API. This is because llama.cpp doesn't support a stable API for doing this, and relying on `eval_callback` can make it complicated to maintain as this API is not intended to be used in multi-sequence setup.
|
||||
- Model-specific features. All API calls and features must remain model-agnostic.
|
||||
- Frontend:
|
||||
- Third-party plugins, it is costly to maintain a public plugin API for such features. Instead, users can make their own MCP server for their needs.
|
||||
- Customizable themes, it is also costly to maintain. While we do focus on the aesthetic, we try to achieve this by perfecting a small set of themes.
|
||||
- Browser-specific features, example: [Chrome's built-in AI API](https://developer.chrome.com/docs/ai/built-in-apis).
|
||||
|
||||
## Backend
|
||||
|
||||
### Overview
|
||||
|
|
|
|||
Loading…
Reference in New Issue