Commit Graph

4 Commits

Author SHA1 Message Date
Georgi Gerganov 87f8930968
examples: remove HF_HUB_OFFLINE to allow dataset download 2026-02-15 21:08:23 +02:00
Georgi Gerganov 9453f9de12
examples: use HF_HUB_OFFLINE to avoid HF Hub warnings 2026-02-15 21:08:23 +02:00
Georgi Gerganov 5a1be6ce37
examples: implement flexible grader system for answer validation
- Add Grader class supporting regex and CLI-based grading
- Implement built-in regex patterns for AIME, GSM8K, MMLU, HellaSwag, ARC, WinoGrande
- Add CLI grader interface: python script.py --answer <pred> --expected <gold>
- Add HF telemetry disable to avoid warnings
- Support exact match requirement for regex patterns
- Add 30-second timeout for CLI grader
- Handle both boxed and plain text formats for AIME answers
2026-02-15 21:08:23 +02:00
Georgi Gerganov 5cc2258e82
examples: add simplified llama-eval-new.py for AIME evaluation
- Create new simplified evaluation script focused only on AIME
- Implement EvalState and Processor dataclasses for structured state management
- Add real-time feedback showing correct/incorrect status per case
- Abstract grading interface for external grader support
- Use structured JSON output for eval state
- Apply HuggingFace dataset caching to avoid repeated downloads
- Remove Levenshtein matching - eval script only sends requests and validates answers
2026-02-15 21:08:22 +02:00