# llama-server-simulator Standalone Python script simulating llama-server HTTP endpoint for testing. ## Features - HTTP Server with OpenAI-compatible `/v1/chat/completions` endpoint - AIME Dataset Integration - Loads 90 questions from HuggingFace - Intelligent Question Matching - Uses exact matching, LaTeX removal, and Levenshtein distance - Configurable Success Rate - Control correct/wrong answer generation (0-1) - Debug Logging - Troubleshoot matching issues ## Usage ```bash python llama-server-simulator.py --success-rate 0.8 ``` ## Arguments - `--success-rate`: Probability of returning correct answer (0.0-1.0, default: 0.8) - `--port`: Server port (default: 8033) - `--debug`: Enable debug logging (default: False) ## Testing ```bash ./test-simulator.sh ``` ## Implementation Details - Uses Levenshtein distance for partial matching (threshold: 0.3) - Automatic caching via HuggingFace datasets library - Wrong answers generated by incrementing expected answer - Debug output written to stderr