llama.cpp/examples/llama-eval/llama-server-simulator-READ...

1016 B

llama-server-simulator

Standalone Python script simulating llama-server HTTP endpoint for testing.

Features

  • HTTP Server with OpenAI-compatible /v1/chat/completions endpoint
  • AIME Dataset Integration - Loads 90 questions from HuggingFace
  • Intelligent Question Matching - Uses exact matching, LaTeX removal, and Levenshtein distance
  • Configurable Success Rate - Control correct/wrong answer generation (0-1)
  • Debug Logging - Troubleshoot matching issues

Usage

python llama-server-simulator.py --success-rate 0.8

Arguments

  • --success-rate: Probability of returning correct answer (0.0-1.0, default: 0.8)
  • --port: Server port (default: 8033)
  • --debug: Enable debug logging (default: False)

Testing

./test-simulator.sh

Implementation Details

  • Uses Levenshtein distance for partial matching (threshold: 0.3)
  • Automatic caching via HuggingFace datasets library
  • Wrong answers generated by incrementing expected answer
  • Debug output written to stderr