llama.cpp/examples/llama-eval/README.md

677 B

llama.cpp/example/llama-eval

llama-eval.py is a single-script evaluation runner that sends prompt/response pairs to any OpenAI-compatible HTTP server (the default llama-server).

./llama-server -m model.gguf --port 8033
python examples/llama-eval/llama-eval.py --path_server http://localhost:8033 --n_prompts 100 --prompt_source arc

The supported tasks are:

  • GSM8K — grade-school math
  • AIME — competition math (integer answers)
  • MMLU — multi-domain multiple choice
  • HellaSwag — commonsense reasoning multiple choice
  • ARC — grade-school science multiple choice
  • WinoGrande — commonsense coreference multiple choice