Add readme

2026-01-12 13:53:39 -05:00 · 2026-01-12 13:53:39 -05:00 · b0d50a5681
parent f3a5b4ea72
commit b0d50a5681
2 changed files with 21 additions and 1 deletions
--- a/examples/llama-eval/README.md
+++ b/examples/llama-eval/README.md
@ -0,0 +1,20 @@
+# llama.cpp/example/llama-eval
+
+The purpose of this example is to to run evaluations metrics against a an openapi api compatible LLM via http (llama-server).
+
+```bash
+./llama-server -m model.gguf --port 8033
+```
+
+```bash
+python examples/llama-eval/llama-eval.py --path_server http://localhost:8033 --n_prompt 100  --prompt_source arc
+```
+
+## Supported tasks (MVP)
+
+- **GSM8K** — grade-school math (final-answer only)
+- **AIME** — competition math (final-answer only)
+- **MMLU** — multi-domain knowledge (multiple choice)
+- **HellaSwag** — commonsense reasoning (multiple choice)
+- **ARC** — grade-school science reasoning (multiple choice)
+- **WinoGrande** — commonsense coreference resolution (multiple choice)
--- a/examples/llama-eval/llama-eval.py
+++ b/examples/llama-eval/llama-eval.py
@ -576,7 +576,7 @@ if __name__ == "__main__":
        "--prompt_source",
        type=str,
        default="mmlu",
-        help=f"Eval types supported: all,{TASK_DICT.keys()}",
+        help=f"Eval types supported: all,{list(TASK_DICT.keys())}",
    )
    parser.add_argument(
        "--n_prompts", type=int, default=None, help="Number of prompts to evaluate"