From fb78ad29bbe7ae00619b2ce31b0a71e95fdbfc43 Mon Sep 17 00:00:00 2001 From: Xuan-Son Nguyen Date: Fri, 20 Mar 2026 14:03:50 +0100 Subject: [PATCH] server: (doc) clarify in-scope and out-scope features (#20794) * server: (doc) clarify in-scope and out-scope features * Apply suggestions from code review Co-authored-by: Georgi Gerganov --------- Co-authored-by: Georgi Gerganov --- CONTRIBUTING.md | 2 ++ tools/server/README-dev.md | 30 ++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index fc26289aec..52898eef8a 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -178,6 +178,8 @@ Maintainers reserve the right to decline review or close pull requests for any r - New code should follow the guidelines (coding, naming, etc.) outlined in this document. Exceptions are allowed in isolated, backend-specific parts of the code that do not interface directly with the `ggml` interfaces. _(NOTE: for legacy reasons, existing code is not required to follow this guideline)_ +- For changes in server, please make sure to refer to the [server development documentation](./tools/server/README-dev.md) + # Documentation - Documentation is a community effort diff --git a/tools/server/README-dev.md b/tools/server/README-dev.md index 3fea3042f7..5f82e35d6c 100644 --- a/tools/server/README-dev.md +++ b/tools/server/README-dev.md @@ -4,6 +4,36 @@ This document provides an in-depth technical overview of `llama-server`, intende If you are an end user consuming `llama-server` as a product, please refer to the main [README](./README.md) instead. +## Scope of features + +In-scope types of feature: + +- Backend: + - Basic inference features: text completion, embeddings output + - Chat-oriented features: chat completion, tool calling + - Third-party API compatibility, e.g. OAI-compat, Anthropic-compat + - Multimodal input/output + - Memory management: save/load state, context checkpoints + - Model management + - Features that are required by the Web UI +- Frontend: + - Chat-oriented features, example: basic chat, image upload, edit messages + - Agentic features, example: MCP + - Model management + +Note: For security reasons, features that require reading or writing external files must be **disabled by default**. This covers features like: MCP, model save/load + +Out-of-scope features: + +- Backend: + - Features that require a loop of external API calls, e.g. server-side agentic loop. This is because external API calls in C++ are costly to maintain. Any complex third-party logic should be implemented outside of server code. + - Features that expose the internal state of the model to the API, example: getting the intermediate activation from API. This is because llama.cpp doesn't support a stable API for doing this, and relying on `eval_callback` can make it complicated to maintain as this API is not intended to be used in multi-sequence setup. + - Model-specific features. All API calls and features must remain model-agnostic. +- Frontend: + - Third-party plugins, it is costly to maintain a public plugin API for such features. Instead, users can make their own MCP server for their needs. + - Customizable themes, it is also costly to maintain. While we do focus on the aesthetic, we try to achieve this by perfecting a small set of themes. + - Browser-specific features, example: [Chrome's built-in AI API](https://developer.chrome.com/docs/ai/built-in-apis). + ## Backend ### Overview