190 lines
6.2 KiB
Markdown
190 lines
6.2 KiB
Markdown
# examples.agent: Interactive agent that can use Python tools!
|
|
|
|
New Python OpenAI API compatibility server, which calls into the C++ server under the hood:
|
|
|
|
```bash
|
|
python -m examples.openai.server --model model.gguf
|
|
```
|
|
|
|
## Prerequisites
|
|
|
|
Note: To get conda, just install Miniforge (it's OSS): https://github.com/conda-forge/miniforge
|
|
|
|
```bash
|
|
conda create -n agent python=3.11
|
|
conda activate agent
|
|
pip install -r examples/openai/requirements.txt
|
|
```
|
|
|
|
## Features
|
|
|
|
The new [examples/openai/server.py](./server.py):
|
|
|
|
- Supports grammar-constrained tool calling for **all** models (incl. Mixtral 7x8B)
|
|
|
|
- Optimised support for Functionary & Nous Hermes, easy to extend to other tool-calling schemes
|
|
|
|
- Generic support w/ JSON schema that guides the model towards tool usage (at the cost of extra tokens):
|
|
|
|
```ts
|
|
{
|
|
// original_thought: string,
|
|
thought_about_next_step_only: string,
|
|
next_step: {tool_calls: {name: string, arguments: any}} | {result: T}
|
|
}
|
|
// Where T is the output JSON schema, or 'any'
|
|
```
|
|
|
|
- Option to publicise schemas to models as TypeScript signatures (as for Functionary) or JSON schema.
|
|
|
|
- Supports models that require user/assistant alternance (like Mixtral Instruct) by merging system messages into user messages.
|
|
|
|
- Spawns the C++ [llama.cpp server](../server) under the hood (unless passed `--endpoint`), but only uses its non-chat endpoint
|
|
|
|
(depending on the prompting strategy, we weave the tool & output schema along with the chat template into the raw model grammar constraints)
|
|
|
|
- Uses the actual Jinja2 templates stored in the GGUF models
|
|
|
|
- Will eventually also spawn `whisper.cpp` and another server subprocess for the embeddings endpoint
|
|
|
|
Rationale: the C++ server lacks some OpenAI compatibility features (and can't realistically keep up with prompt templates w/o bringing in too many dependencies), this new layer could allow focusing the C++ server on serving efficiency and delegate OAI compliance to a layer easier to maintain.
|
|
|
|
## Test
|
|
|
|
If you want to see tools in action, look at the [agent example](../agent). Otherwise:
|
|
|
|
Start the server in Terminal 1:
|
|
|
|
```bash
|
|
python -m examples.openai --model ~/AI/Models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf
|
|
```
|
|
|
|
Query it in Terminal 2 (or use it from any framework that makes use of tools: note tool calls are guaranteed to comply to the schema, so retries are likely not necessary!):
|
|
|
|
```bash
|
|
curl http://localhost:8080/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "gpt-3.5-turbo",
|
|
"tools": [{
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_current_weather",
|
|
"description": "Get the current weather",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state, e.g. San Francisco, CA"
|
|
},
|
|
"format": {
|
|
"type": "string",
|
|
"enum": ["celsius", "fahrenheit"],
|
|
"description": "The temperature unit to use. Infer this from the users location."
|
|
}
|
|
},
|
|
"required": ["location", "format"]
|
|
}
|
|
}
|
|
}, {
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_n_day_weather_forecast",
|
|
"description": "Get an N-day weather forecast",
|
|
"parameters": {
|
|
"type": "object",
|
|
"properties": {
|
|
"location": {
|
|
"type": "string",
|
|
"description": "The city and state, e.g. San Francisco, CA"
|
|
},
|
|
"format": {
|
|
"type": "string",
|
|
"enum": ["celsius", "fahrenheit"],
|
|
"description": "The temperature unit to use. Infer this from the users location."
|
|
},
|
|
"num_days": {
|
|
"type": "integer",
|
|
"description": "The number of days to forecast"
|
|
}
|
|
},
|
|
"required": ["location", "format", "num_days"]
|
|
}
|
|
}
|
|
}],
|
|
"messages": [
|
|
{"role": "system", "content": "Do not make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous."},
|
|
{"role": "user", "content": "what is the weather going to be like in San Francisco and Glasgow over the next 4 days"}
|
|
]
|
|
}'
|
|
```
|
|
|
|
<details>
|
|
<summary>Show output</summary>
|
|
|
|
```json
|
|
{
|
|
"id": "chatcmpl-3095057176",
|
|
"object": "chat.completion",
|
|
"created": 1711726921,
|
|
"model": "gpt-3.5-turbo",
|
|
"choices": [
|
|
{
|
|
"index": 0,
|
|
"message": {
|
|
"role": "assistant",
|
|
"name": null,
|
|
"tool_call_id": null,
|
|
"content": "In order to provide the required information, I need to call the get_n_day_weather_forecast function twice, once for San Francisco and once for Glasgow.",
|
|
"tool_calls": [
|
|
{
|
|
"id": "call_970977",
|
|
"type": "function",
|
|
"function": {
|
|
"name": "get_n_day_weather_forecast",
|
|
"arguments": {
|
|
"location": "San Francisco, CA",
|
|
"format": "celsius",
|
|
"num_days": 4
|
|
}
|
|
}
|
|
}
|
|
]
|
|
},
|
|
"logprobs": null,
|
|
"finish_reason": "tool_calls"
|
|
}
|
|
],
|
|
"usage": {
|
|
"prompt_tokens": 546,
|
|
"completion_tokens": 118,
|
|
"total_tokens": 664
|
|
},
|
|
"system_fingerprint": "...",
|
|
"error": null
|
|
}
|
|
```
|
|
|
|
</details>
|
|
|
|
## TODO
|
|
|
|
- Embedding endpoint w/ distinct server subprocess
|
|
|
|
- Evaluate options for session caching
|
|
|
|
- Pass session id & store / read from file?
|
|
|
|
- Support parent session ids for trees of thought?
|
|
|
|
- Support precaching long prompts from CLI / read session files?
|
|
|
|
- Follow-ups
|
|
|
|
- Remove OAI support from server
|
|
|
|
- Remove non-Python json-schema-to-grammar versions
|
|
|
|
- Reach out to frameworks to advertise new option.
|