1016 B
1016 B
llama-server-simulator
Standalone Python script simulating llama-server HTTP endpoint for testing.
Features
- HTTP Server with OpenAI-compatible
/v1/chat/completionsendpoint - AIME Dataset Integration - Loads 90 questions from HuggingFace
- Intelligent Question Matching - Uses exact matching, LaTeX removal, and Levenshtein distance
- Configurable Success Rate - Control correct/wrong answer generation (0-1)
- Debug Logging - Troubleshoot matching issues
Usage
python llama-server-simulator.py --success-rate 0.8
Arguments
--success-rate: Probability of returning correct answer (0.0-1.0, default: 0.8)--port: Server port (default: 8033)--debug: Enable debug logging (default: False)
Testing
./test-simulator.sh
Implementation Details
- Uses Levenshtein distance for partial matching (threshold: 0.3)
- Automatic caching via HuggingFace datasets library
- Wrong answers generated by incrementing expected answer
- Debug output written to stderr