History

ochafik 63d13245e1 server.py: hacky code		2024-04-27 23:14:11 +01:00
..
README.md	server.py: hacky code	2024-04-27 23:14:11 +01:00
__main__.py	server.py: hacky code	2024-04-27 23:14:11 +01:00
api.py	server.py: hacky code	2024-04-27 23:14:11 +01:00
chat_format.py	server.py: hacky code	2024-04-27 23:14:11 +01:00
gguf_kvs.py	server.py: hacky code	2024-04-27 23:14:11 +01:00
llama_cpp_server_api.py	server.py: hacky code	2024-04-27 23:14:11 +01:00
requirements.txt	server.py: hacky code	2024-04-27 23:14:11 +01:00
server.py	server.py: hacky code	2024-04-27 23:14:11 +01:00
ts_converter.py	server.py: hacky code	2024-04-27 23:14:11 +01:00

examples.openai: OpenAI API-compatible server

A simple Python server that sits above the C++ ../server and offers improved OAI compatibility.

Usage

python -m examples.openai -m some-model.gguf

The new examples/openai/server.py:

Uses llama.cpp C++ server as a backend (spawns it or connects to existing)
Uses actual jinja2 chat templates read from the models
Supports grammar-constrained output for both JSON response format and tool calls
Tool calling “works” w/ all models (even non-specialized ones like Mixtral 7x8B)
- Optimised support for Functionary & Nous Hermes, easy to extend to other tool-calling fine-tunes

Embedding endpoint w/ distinct server subprocess
Automatic/manual session caching
- Spawns the main C++ CLI under the hood
- Support precaching long prompts from CLI
- Instant incremental inference in long threads
Improve examples/agent:
- Interactive agent CLI that auto-discovers tools from OpenAPI endpoints
- Script that wraps any Python source as a container-sandboxed OpenAPI endpoint (allowing running ~unsafe code w/ tools)
- Basic memory / RAG / python interpreter tools
Follow-ups
- Remove OAI support from server
- Remove non-Python json schema to grammar converters
- Reach out to frameworks to advertise new option.