4.7 KiB
4.7 KiB
llama-server-simulator Implementation Plan
Overview
Create a standalone Python script that simulates a llama-server HTTP endpoint for testing the eval script.
Goals
- Simulate llama-server's
/v1/chat/completionsendpoint - Accept requests and respond with expected answers from AIME dataset
- Implement configurable success rate (sometimes right, sometimes wrong)
- Use regex matching to find questions in incoming requests
- Test with curl requests before integrating with eval script
Implementation Plan
Phase 1: Basic Simulator Structure
- Create
llama-server-simulator.pyscript - Set up Flask/FastAPI HTTP server
- Implement
/v1/chat/completionsendpoint - Handle basic request/response format
Phase 2: AIME Dataset Integration
- Load AIME dataset
- Store questions and expected answers
- Implement regex matching to find questions in incoming requests
- Extract expected answer from matched question
Phase 3: Response Generation
- Implement success rate configuration
- Randomly determine if response should be correct or incorrect
- Generate appropriate response based on success determination
- Format response in OpenAI-compatible format
Phase 4: Testing
- Write curl commands to test basic functionality
- Test correct responses
- Test incorrect responses
- Test edge cases (no question found, etc.)
Technical Details
Server Framework
- Use Flask for simplicity
- Listen on configurable port
- Support JSON request/response format
Request Format
{
"model": "llama",
"messages": [
{"role": "user", "content": "Question text here"}
],
"temperature": 0,
"max_tokens": 2048
}
Response Format
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "llama",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Answer text here"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 100,
"completion_tokens": 50,
"total_tokens": 150
}
}
AIME Dataset Integration
- Load from HuggingFace: "AI-MO/aimo-validation-aime"
- Store in memory for fast lookup
- Regex pattern to find question text in request
- Extract answer from matched question
Success Rate Configuration
- Command-line argument:
--success-rate 0.8(80% success rate) - Randomly determine correctness based on rate
- Log when responses are correct vs incorrect
Testing Strategy
- Start simulator with default settings
- Send curl request with known question
- Verify response contains expected answer
- Test with different success rates
- Test edge cases
Implementation Steps
Step 1: Basic Server Setup
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/v1/chat/completions', methods=['POST'])
def chat_completions():
# Handle request
return jsonify(response)
Step 2: Load AIME Dataset
import datasets
ds = datasets.load_dataset("AI-MO/aimo-validation-aime", split="train")
# Store in memory
Step 3: Regex Matching
import re
def find_question_in_request(request_text):
# Regex pattern to find question
pattern = r"question:\s*(.*?)\n"
match = re.search(pattern, request_text, re.DOTALL)
return match.group(1) if match else None
Step 4: Response Generation
import random
def generate_response(question, success_rate):
if random.random() < success_rate:
return get_expected_answer(question)
else:
return get_wrong_answer(question)
Step 5: Testing with Curl
curl -X POST http://localhost:8033/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama",
"messages": [{"role": "user", "content": "Question text"}]
}'
Configuration Options
--port: Server port (default: 8033)--success-rate: Success rate 0-1 (default: 0.8)--host: Server host (default: localhost)--dataset-split: AIME split to use (default: train)
Expected Output
=== llama-server-simulator ===
Server running on http://localhost:8033
Success rate: 0.8
AIME dataset loaded: 1000 questions
Testing Checklist
- Server starts successfully
- Basic request/response works
- Correct answer returned when success rate allows
- Wrong answer returned when success rate doesn't allow
- No question found returns error
- Multiple requests work correctly
- Different success rates work as expected
Next Steps
- Implement basic server structure
- Load AIME dataset
- Implement regex matching
- Add response generation with success rate
- Test with curl commands
- Integrate with eval script once simulator works