- Add Grader class supporting regex and CLI-based grading - Implement built-in regex patterns for AIME, GSM8K, MMLU, HellaSwag, ARC, WinoGrande - Add CLI grader interface: python script.py --answer <pred> --expected <gold> - Add HF telemetry disable to avoid warnings - Support exact match requirement for regex patterns - Add 30-second timeout for CLI grader - Handle both boxed and plain text formats for AIME answers |
||
|---|---|---|
| .. | ||
| llama-eval-discussion.md | ||
| llama-eval-new.py | ||
| llama-eval.py | ||
| llama-server-simulator-plan.md | ||
| llama-server-simulator.py | ||
| simulator-summary.md | ||
| test-grader.py | ||
| test-simulator.sh | ||