llama-server-simulator

Standalone Python script simulating llama-server HTTP endpoint for testing.

Features

HTTP Server with OpenAI-compatible /v1/chat/completions endpoint
AIME Dataset Integration - Loads 90 questions from HuggingFace
Intelligent Question Matching - Uses exact matching, LaTeX removal, and Levenshtein distance
Configurable Success Rate - Control correct/wrong answer generation (0-1)
Debug Logging - Troubleshoot matching issues

python llama-server-simulator.py --success-rate 0.8

--success-rate: Probability of returning correct answer (0.0-1.0, default: 0.8)
--port: Server port (default: 8033)
--debug: Enable debug logging (default: False)

./test-simulator.sh