server: (doc) clarify in-scope and out-scope features (#20794)

* server: (doc) clarify in-scope and out-scope features

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
This commit is contained in:
Xuan-Son Nguyen 2026-03-20 14:03:50 +01:00 committed by GitHub
parent e06c3ab2bc
commit fb78ad29bb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 32 additions and 0 deletions

View File

@ -178,6 +178,8 @@ Maintainers reserve the right to decline review or close pull requests for any r
- New code should follow the guidelines (coding, naming, etc.) outlined in this document. Exceptions are allowed in isolated, backend-specific parts of the code that do not interface directly with the `ggml` interfaces.
_(NOTE: for legacy reasons, existing code is not required to follow this guideline)_
- For changes in server, please make sure to refer to the [server development documentation](./tools/server/README-dev.md)
# Documentation
- Documentation is a community effort

View File

@ -4,6 +4,36 @@ This document provides an in-depth technical overview of `llama-server`, intende
If you are an end user consuming `llama-server` as a product, please refer to the main [README](./README.md) instead.
## Scope of features
In-scope types of feature:
- Backend:
- Basic inference features: text completion, embeddings output
- Chat-oriented features: chat completion, tool calling
- Third-party API compatibility, e.g. OAI-compat, Anthropic-compat
- Multimodal input/output
- Memory management: save/load state, context checkpoints
- Model management
- Features that are required by the Web UI
- Frontend:
- Chat-oriented features, example: basic chat, image upload, edit messages
- Agentic features, example: MCP
- Model management
Note: For security reasons, features that require reading or writing external files must be **disabled by default**. This covers features like: MCP, model save/load
Out-of-scope features:
- Backend:
- Features that require a loop of external API calls, e.g. server-side agentic loop. This is because external API calls in C++ are costly to maintain. Any complex third-party logic should be implemented outside of server code.
- Features that expose the internal state of the model to the API, example: getting the intermediate activation from API. This is because llama.cpp doesn't support a stable API for doing this, and relying on `eval_callback` can make it complicated to maintain as this API is not intended to be used in multi-sequence setup.
- Model-specific features. All API calls and features must remain model-agnostic.
- Frontend:
- Third-party plugins, it is costly to maintain a public plugin API for such features. Instead, users can make their own MCP server for their needs.
- Customizable themes, it is also costly to maintain. While we do focus on the aesthetic, we try to achieve this by perfecting a small set of themes.
- Browser-specific features, example: [Chrome's built-in AI API](https://developer.chrome.com/docs/ai/built-in-apis).
## Backend
### Overview