diff --git a/docs/backend/GGML-VirtGPU/configuration.md b/docs/backend/GGML-VirtGPU/configuration.md new file mode 100644 index 0000000000..5f1d1e9e1b --- /dev/null +++ b/docs/backend/GGML-VirtGPU/configuration.md @@ -0,0 +1,175 @@ +# GGML-VirtGPU Backend Configuration + +This document describes the environment variables used by the ggml-virtgpu backend system, covering both the frontend (guest-side) and backend (host-side) components. + +## Environment Variables Overview + +The ggml-virtgpu backend uses environment variables for configuration across three main components: +- **Frontend (Guest)**: GGML applications running in VMs +- **Hypervisor**: Virglrenderer/APIR system +- **Backend (Host)**: Host-side GGML backend integration + +## Frontend (Guest-side) Configuration + +### GGML_REMOTING_USE_APIR_CAPSET +- **Location**: `ggml/src/ggml-virtgpu/virtgpu.cpp` +- **Type**: Boolean flag (presence-based) +- **Purpose**: Controls which virtio-gpu capability set to use for communication +- **Values**: + - Set (any value): Use the APIR capset (long-term setup) + - Unset: Use the Venus capset (easier for testing with an unmodified hypervisor) +- **Default**: Unset (Venus capset) +- **Usage**: + ```bash + export GGML_REMOTING_USE_APIR_CAPSET=1 # Use APIR capset + # or leave unset for Venus capset + ``` + +## Hypervisor (Virglrenderer/APIR) Configuration + +These environment variables are used during the transition phase for +running with an unmodified hypervisor (not supporting the +VirglRenderer APIR component). They will be removed in the future, and +the hypervisor will instead configure VirglRenderer with the APIR +_Configuration Key_. + +### VIRGL_APIR_BACKEND_LIBRARY +- **Location**: `virglrenderer/src/apir/apir-context.c` +- **Configuration Key**: `apir.load_library.path` +- **Type**: File path string +- **Purpose**: Path to the APIR backend library that virglrenderer should dynamically load +- **Required**: Yes +- **Example**: + ```bash + export VIRGL_APIR_BACKEND_LIBRARY="/path/to/libggml-remotingbackend.so" + ``` + +### VIRGL_ROUTE_VENUS_TO_APIR +- **Location**: `virglrenderer/src/apir/apir-renderer.h` +- **Type**: Boolean flag (presence-based) +- **Purpose**: Temporary workaround to route Venus capset calls to APIR during hypervisor transition period +- **Status**: will be removed once hypervisors support APIR natively +- **Warning**: Breaks normal Vulkan/Venus functionality +- **Usage**: + ```bash + export VIRGL_ROUTE_VENUS_TO_APIR=1 # For testing with an unmodified hypervisor + ``` + +### VIRGL_APIR_LOG_TO_FILE +- **Location**: `virglrenderer/src/apir/apir-renderer.c` +- **Environment Variable**: `VIRGL_APIR_LOG_TO_FILE` +- **Type**: File path string +- **Purpose**: Enable debug logging from the VirglRenderer APIR component to specified file +- **Required**: No (optional debugging) +- **Default**: Logging to `stderr` +- **Usage**: + ```bash + export VIRGL_APIR_LOG_TO_FILE="/tmp/apir-debug.log" + ``` + +## Backend (Host-side) Configuration + +These environment variables are used during the transition phase for +running with an unmodified hypervisor (not supporting the +VirglRenderer APIR component). They will be removed in the future, and +the hypervisor will instead configure VirglRenderer with the APIR +_Configuration Key_. + +### APIR_LLAMA_CPP_GGML_LIBRARY_PATH +- **Location**: `ggml/src/ggml-virtgpu/backend/backend.cpp` +- **Environment Variable**: `APIR_LLAMA_CPP_GGML_LIBRARY_PATH` +- **Configuration Key**: `ggml.library.path` +- **Type**: File path string +- **Purpose**: Path to the actual GGML backend library (Metal, CUDA, Vulkan, etc.) +- **Required**: **Yes** - backend initialization fails without this +- **Examples**: + ```bash + # macOS with Metal backend + export APIR_LLAMA_CPP_GGML_LIBRARY_PATH="/opt/llama.cpp/lib/libggml-metal.dylib" + + # Linux with CUDA backend + export APIR_LLAMA_CPP_GGML_LIBRARY_PATH="/opt/llama.cpp/lib/libggml-cuda.so" + + # macOS or Linux with Vulkan backend + export APIR_LLAMA_CPP_GGML_LIBRARY_PATH="/opt/llama.cpp/lib/libggml-vulkan.so" + ``` + +### APIR_LLAMA_CPP_GGML_LIBRARY_REG +- **Location**: `ggml/src/ggml-virtgpu/backend/backend.cpp` +- **Environment Variable**: `APIR_LLAMA_CPP_GGML_LIBRARY_REG` +- **Configuration Key**: `ggml.library.reg` +- **Type**: Function symbol name string +- **Purpose**: Name of the backend registration function to call after loading the library +- **Required**: No (defaults to `ggml_backend_init`) +- **Default**: `ggml_backend_init` +- **Examples**: + ```bash + # Metal backend + export APIR_LLAMA_CPP_GGML_LIBRARY_REG="ggml_backend_metal_reg" + + # CUDA backend + export APIR_LLAMA_CPP_GGML_LIBRARY_REG="ggml_backend_cuda_reg" + + # Vulkan backend + export APIR_LLAMA_CPP_GGML_LIBRARY_REG="ggml_backend_vulkan_reg" + + # Generic fallback (default) + # export APIR_LLAMA_CPP_GGML_LIBRARY_REG="ggml_backend_init" + ``` + +### APIR_LLAMA_CPP_LOG_TO_FILE +- **Location**: `ggml/src/ggml-virtgpu/backend/backend.cpp:62` +- **Environment Variable**: `APIR_LLAMA_CPP_LOG_TO_FILE` +- **Type**: File path string +- **Purpose**: Enable debug logging from the GGML backend to specified file +- **Required**: No (optional debugging) +- **Usage**: + ```bash + export APIR_LLAMA_CPP_LOG_TO_FILE="/tmp/ggml-backend-debug.log" + ``` + +## Configuration Flow + +The configuration system works as follows: + +1. **Hypervisor Setup**: Virglrenderer loads the APIR backend library specified by `VIRGL_APIR_BACKEND_LIBRARY` + +2. **Context Creation**: When an APIR context is created, it populates a configuration table with environment variables: + - `apir.load_library.path` ← `VIRGL_APIR_BACKEND_LIBRARY` + - `ggml.library.path` ← `APIR_LLAMA_CPP_GGML_LIBRARY_PATH` + - `ggml.library.reg` ← `APIR_LLAMA_CPP_GGML_LIBRARY_REG` + - `ggml.library.init` ← `APIR_LLAMA_CPP_GGML_LIBRARY_INIT` + - this step will eventually be performed by the hypervisor itself, with command-line arguments instead of environment variables. + +3. **Backend Initialization**: The backend queries the configuration via callbacks: + - `virgl_cbs->get_config(ctx_id, "ggml.library.path")` returns the library path + - `virgl_cbs->get_config(ctx_id, "ggml.library.reg")` returns the registration function + +4. **Library Loading**: The backend dynamically loads and initializes the specified GGML library + +## Error Messages + +Common error scenarios and their messages: + +- **Missing library path**: `"cannot open the GGML library: env var 'APIR_LLAMA_CPP_GGML_LIBRARY_PATH' not defined"` +- **Missing registration function**: `"cannot register the GGML library: env var 'APIR_LLAMA_CPP_GGML_LIBRARY_REG' not defined"` + +## Example Complete Configuration + +Here's an example configuration for a macOS host with Metal backend: + +```bash +# Hypervisor environment +export VIRGL_APIR_BACKEND_LIBRARY="/opt/llama.cpp/lib/libggml-virtgpu-backend.dylib" + +# Backend configuration +export APIR_LLAMA_CPP_GGML_LIBRARY_PATH="/opt/llama.cpp/lib/libggml-metal.dylib" +export APIR_LLAMA_CPP_GGML_LIBRARY_REG="ggml_backend_metal_reg" + +# Optional logging +export VIRGL_APIR_LOG_TO_FILE="/tmp/apir.log" +export APIR_LLAMA_CPP_LOG_TO_FILE="/tmp/ggml.log" + +# Guest configuration +export GGML_REMOTING_USE_APIR_CAPSET=1 +``` diff --git a/docs/backend/GGML-VirtGPU/ggml-virt.md b/docs/backend/GGML-VirtGPU/ggml-virt.md new file mode 100644 index 0000000000..17850e6452 --- /dev/null +++ b/docs/backend/GGML-VirtGPU/ggml-virt.md @@ -0,0 +1,191 @@ +# GGML-VirtGPU Backend + +The GGML-VirtGPU backend enables GGML applications to run machine +learning computations on host hardware while the application itself +runs inside a virtual machine. It uses host-guest shared memory to +efficiently share data buffers between the two sides. + +This backend relies on the virtio-gpu, and VirglRenderer API Remoting +(APIR) component. The backend is split into two libraries: +- a GGML implementation (the "remoting frontend"), running in the + guest and interacting with the virtgpu device +- a VirglRenderer APIR compatible library (the "remoting backend"), + running in the host and interacting with Virglrenderer and an actual + GGML device backend. + +## Architecture Overview + +The GGML-VirtGPU backend consists of three main components: + +``` +┌─────────────────────────────────────────┐ +│ GGML Application │ +│ (llama.cpp, etc.) │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ Guest VM (Frontend) │ +│ ggml-virtgpu library │ +│ │ +│ ┌─────────────────────────────────┐ │ +│ │ GGML Backend Interface │ │ +│ └─────────────────────────────────┘ │ +│ ↓ │ +│ ┌─────────────────────────────────┐ │ +│ │ VirtGPU Communication │ │ +│ │ (hypercalls + shared mem) │ │ +│ └─────────────────────────────────┘ │ +└─────────────────────────────────────────┘ + ↓ + virtio-gpu / virglrenderer APIR + ↓ +┌─────────────────────────────────────────┐ +│ Host System (Backend) │ +│ │ +│ ┌─────────────────────────────────┐ │ +│ │ Backend Dispatcher │ │ +│ └─────────────────────────────────┘ │ +│ ↓ │ +│ ┌─────────────────────────────────┐ │ +│ │ GGML Backend Library │ │ +│ │ (Metal/Vulkan/CPU/...) │ │ +│ └─────────────────────────────────┘ │ +└─────────────────────────────────────────┘ +``` + +### Key Components + +1. **Guest-side Frontend** (`ggml-virtgpu/`): Implements the GGML backend interface and forwards operations to the host +2. **Host-side Backend** (`ggml-virtgpu/backend/`): Receives forwarded operations and executes them on actual hardware backends +3. **Communication Layer**: Uses virtio-gpu hypercalls and shared memory for efficient data transfer + +## Features + +- **Dynamic backend loading** on the host side (CPU, CUDA, Metal, etc.) +- **Zero-copy data transfer** via host-guest shared memory pages + +## Communication Protocol + +### Hypercalls and Shared Memory + +The backend uses two primary communication mechanisms: + +1. **Hypercalls (`DRM_IOCTL_VIRTGPU_EXECBUFFER`)**: Trigger remote execution from guest to host +2. **Shared Memory Pages**: Zero-copy data transfer for tensors and parameters + +#### Shared Memory Layout + +Each connection uses two shared memory buffers: + +- **Data Buffer** (24 MiB): For command/response data and tensor transfers +- **Reply Buffer** (16 KiB): For command replies and status information +- **Data Buffers**: Dynamically allocated host-guest shared buffers + served as GGML buffers. + +### APIR Protocol + +The Virglrender API Remoting protocol defines three command types: + +- `HANDSHAKE`: Protocol version negotiation and capability discovery +- `LOADLIBRARY`: Dynamic loading of backend libraries on the host +- `FORWARD`: API function call forwarding + +### Binary Serialization + +Commands and data are serialized using a custom binary protocol with: + +- Fixed-size encoding for basic types +- Variable-length arrays with size prefixes +- Buffer bounds checking +- Error recovery mechanisms + +## Supported Operations + +### Device Operations +- Device enumeration and capability queries +- Memory information (total/free) +- Backend type detection + +### Buffer Operations +- Buffer allocation and deallocation +- Tensor data transfer (host ↔ guest) +- Memory copying and clearing + +### Computation Operations +- Graph execution forwarding + +## Build Requirements + +### Guest-side Dependencies +- `libdrm` for DRM/virtio-gpu communication +- C++20 compatible compiler +- CMake 3.14+ + +### Host-side Dependencies +- virglrenderer with APIR support (pending upstream review) +- Target backend libraries (libggml-metal, libggml-vulkan, etc.) + +## Configuration + +### Environment Variables + +- `GGML_VIRTGPU_BACKEND_LIBRARY`: Path to the host-side backend library +- `GGML_VIRTGPU_DEBUG`: Enable debug logging + +### Build Options + +- `GGML_VIRTGPU`: Enable the VirtGPU backend (`ON` or `OFF`, default: `OFF`) +- `GGML_VIRTGPU_BACKEND`: Build the host-side backend component (`ON`, `OFF` or `ONLY`, default: `OFF`) + +### System Requirements + +- VM with virtio-gpu support +- VirglRenderer with APIR patches +- Compatible backend libraries on host + +## Limitations + +- **VM-specific**: Only works in virtual machines with virtio-gpu support +- **Host dependency**: Requires properly configured host-side backend +- **Latency**: Small overhead from VM escaping for each operation + +## Development + +### Code Generation + +The backend uses code generation from YAML configuration: + +```bash +# Regenerate protocol code +cd ggml-virtgpu/ +python regenerate_remoting.py +``` + +### Adding New Operations + +1. Add function definition to `ggmlremoting_functions.yaml` +2. Regenerate code with `regenerate_remoting.py` +3. Implement guest-side forwarding in `virtgpu-forward-*.cpp` +4. Implement host-side handling in `backend-dispatched-*.cpp` + +## Limitations + +* This work is pending upstream changes in the VirglRenderer + project. + * The backend can be tested with Virglrenderer compiled from source + using this PR: + https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1590 +* This work is pending changes in the VMM/hypervisor running the + virtual machine, which need to know how to route the newly + introduced APIR capset. + * The environment variable `VIRGL_ROUTE_VENUS_TO_APIR=1` allows + using the Venus capset, until the relevant hypervisors have been + patched. However, setting this flag breaks the Vulkan/Venus normal + behavior. + * The environment variable `GGML_REMOTING_USE_APIR_CAPSET` tells the + `ggml-virtgpu` backend to use the APIR capset. This will become + the default when the relevant hypervisors have been patched. + +* This work focused on improving the performance of llama.cpp running + on MacOS containers, and is mainly tested on this platform. The + linux support (via `krun`) is in progress. diff --git a/docs/backend/GGML-VirtGPU/testing.md b/docs/backend/GGML-VirtGPU/testing.md new file mode 100644 index 0000000000..5a1a64a948 --- /dev/null +++ b/docs/backend/GGML-VirtGPU/testing.md @@ -0,0 +1,196 @@ +### Testing + +This document provides instructions for building and testing the GGML-VirtGPU backend on macOS with containers. + +#### Prerequisites + +The testing setup requires: + +- macOS host system +- Container runtime with `libkrun` provider (podman machine) +- Access to development patchset for VirglRenderer + +#### Required Patchsets + +The backend requires patches that are currently under review: + +- **Virglrenderer APIR upstream PR**: https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1590 (for reference) +- **MacOS Virglrenderer for krunkit**: https://gitlab.freedesktop.org/kpouget/virglrenderer/-/tree/main-macos + +#### Build Instructions + +##### 1. Build ggml-virtgpu-backend (Host-side, macOS) + +```bash +# Build the backend that runs natively on macOS +mkdir llama.cpp +cd llama.cpp +git clone https://github.com/ggerganov/llama.cpp.git src +cd src + +LLAMA_MAC_BUILD=$PWD/build/ggml-virtgpu-backend + +cmake -S . -B $LLAMA_MAC_BUILD \ + -DGGML_NATIVE=OFF \ + -DLLAMA_CURL=ON \ + -DGGML_REMOTINGBACKEND=ONLY \ + -DGGML_METAL=ON + +TARGETS="ggml-metal" +cmake --build $LLAMA_MAC_BUILD --parallel 8 --target $TARGETS + +# Build additional tools for native benchmarking +EXTRA_TARGETS="llama-run llama-bench" +cmake --build $LLAMA_MAC_BUILD --parallel 8 --target $EXTRA_TARGETS +``` + +##### 2. Build virglrenderer (Host-side, macOS) + +```bash +# Build virglrenderer with APIR support +mkdir virglrenderer +git clone https://gitlab.freedesktop.org/kpouget/virglrenderer -b main-macos src +cd src + +VIRGL_BUILD_DIR=$PWD/build + +# -Dvenus=true and VIRGL_ROUTE_VENUS_TO_APIR=1 route the APIR requests via the Venus backend, for easier testing without a patched hypervisor + +meson setup $VIRGL_BUILD_DIR \ + -Dvenus=true \ + -Dapir=true + +ninja -C $VIRGL_BUILD_DIR +``` + +##### 3. Build ggml-virtgpu (Guest-side, Linux) + +```bash +# Inside a Linux container +mkdir llama.cpp +git clone https://github.com/ggerganov/llama.cpp.git src +cd src + +LLAMA_LINUX_BUILD=$PWD//build-virtgpu + +cmake -S . -B $LLAMA_LINUX_BUILD \ + -DGGML_VIRTGPU=ON + +ninja -C $LLAMA_LINUX_BUILD +``` + +Option B: Build container image with frontend: + +```bash +cat << EOF > remoting.containerfile +FROM quay.io/fedora/fedora:43 +USER 0 + +WORKDIR /app/remoting + +ARG LLAMA_CPP_REPO="https://github.com/ggerganov/llama.cpp.git" +ARG LLAMA_CPP_VERSION="master" +ARG LLAMA_CPP_CMAKE_FLAGS="-DGGML_VIRTGPU=ON" +ARG LLAMA_CPP_CMAKE_BUILD_FLAGS="--parallel 4" + +RUN dnf install -y git cmake gcc gcc-c++ libcurl-devel libdrm-devel + +RUN git clone "\${LLAMA_CPP_REPO}" src \\ + && git -C src fetch origin \${LLAMA_CPP_VERSION} \\ + && git -C src reset --hard FETCH_HEAD + +RUN mkdir -p build \\ + && cd src \\ + && set -o pipefail \\ + && cmake -S . -B ../build \${LLAMA_CPP_CMAKE_FLAGS} \\ + && cmake --build ../build/ \${LLAMA_CPP_CMAKE_BUILD_FLAGS} + +ENTRYPOINT ["/app/remoting/src/build/bin/llama-server"] +EOF + +mkdir -p empty_dir +podman build -f remoting.containerfile ./empty_dir -t localhost/remoting-frontend +``` + +#### Environment Setup + +##### Set krunkit Environment Variables + +```bash +# Define the base directories (adapt these paths to your system) +VIRGL_BUILD_DIR=$HOME/remoting/virglrenderer/build +LLAMA_MAC_BUILD=$HOME/remoting/llama.cpp/build-backend + +# For krunkit to load the custom virglrenderer library +export DYLD_LIBRARY_PATH=$VIRGL_BUILD_DIR/src + +# For Virglrenderer to load the ggml-remotingbackend library +export VIRGL_APIR_BACKEND_LIBRARY="$LLAMA_MAC_BUILD/bin/libggml-virtgpu-backend.dylib" + +# For llama.cpp remotingbackend to load the ggml-metal backend +export APIR_LLAMA_CPP_GGML_LIBRARY_PATH="$LLAMA_MAC_BUILD/bin/libggml-metal.dylib" +export APIR_LLAMA_CPP_GGML_LIBRARY_REG=ggml_backend_metal_reg +``` + +##### Launch Container Environment + +```bash +# Set container provider to libkrun +export CONTAINERS_MACHINE_PROVIDER=libkrun +podman machine start +``` + +##### Verify Environment + +Confirm that krunkit is using the correct virglrenderer library: + +```bash +lsof -c krunkit | grep virglrenderer +# Expected output: +# krunkit 50574 user txt REG 1,14 2273912 10849442 ($VIRGL_BUILD_DIR/src)/libvirglrenderer.1.dylib +``` + +#### Running Tests + +##### Launch Test Container + +```bash +# Optional model caching +mkdir -p models +PODMAN_CACHE_ARGS="-v models:/models --user root:root --cgroupns host --security-opt label=disable -w /models" + +podman run $PODMAN_CACHE_ARGS -it --rm --device /dev/dri localhost/remoting-frontend bash +``` + +##### Test llama.cpp in Container + +```bash + +# Run performance benchmark +/app/remoting/build/bin/llama-bench -m ./llama3.2 +``` + +Expected output (performance may vary): +``` +| model | size | params | backend | ngl | test | t/s | +| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: | +| llama 3B Q4_K - Medium | 1.87 GiB | 3.21 B | RemotingFrontend | 99 | pp512 | 991.30 ± 0.66 | +| llama 3B Q4_K - Medium | 1.87 GiB | 3.21 B | RemotingFrontend | 99 | tg128 | 85.71 ± 0.11 | +``` + +#### Troubleshooting + +##### SSH Environment Variable Issues + +⚠️ **Warning**: Setting `DYLD_LIBRARY_PATH` from SSH doesn't work on macOS. Here is a workaround: + +**Workaround 1: Replace system library** +```bash +VIRGL_BUILD_DIR=$HOME/remoting/virglrenderer/build # ⚠️ adapt to your system +BREW_VIRGL_DIR=/opt/homebrew/Cellar/virglrenderer/0.10.4d/lib +VIRGL_LIB=libvirglrenderer.1.dylib + +cd $BREW_VIRGL_DIR +mv $VIRGL_LIB ${VIRGL_LIB}.orig +ln -s $VIRGL_BUILD_DIR/src/$VIRGL_LIB +```