Commit Graph

1649 Commits

Author SHA1 Message Date
Georgi Gerganov 2ffa45edfc
add tokens 2026-02-16 21:52:54 +02:00
Georgi Gerganov 9c29be1177
store full response 2026-02-16 21:44:29 +02:00
Georgi Gerganov 013963cfd5
add html 2026-02-16 21:22:06 +02:00
Georgi Gerganov e2e998a2d6
fix prompts 2026-02-16 21:02:25 +02:00
Georgi Gerganov 6c41664b8b
simplify 2026-02-16 19:50:27 +02:00
Georgi Gerganov 7b84af8051
fix counts 2026-02-16 16:38:31 +02:00
Georgi Gerganov 60a501e138
cleanup 2026-02-16 16:31:14 +02:00
Georgi Gerganov e6e777cfb3
resume eval 2026-02-16 16:21:36 +02:00
Georgi Gerganov ad3a54eb68
ignore errors 2026-02-16 15:23:23 +02:00
Georgi Gerganov c6d70b9bea
add AGENTS.md 2026-02-16 13:13:35 +02:00
Georgi Gerganov de956a6ca8
cleanup 2026-02-16 12:02:16 +02:00
Georgi Gerganov 350e7c1409
datasets : fix aime2025 2026-02-16 11:55:57 +02:00
Georgi Gerganov db10dda1f3
grade : improve regex + logs 2026-02-16 11:51:36 +02:00
Georgi Gerganov 52759bf078
grader : update prompt 2026-02-16 11:17:53 +02:00
Georgi Gerganov 99e3c3d02c
datasets : add aime2025 2026-02-16 11:07:54 +02:00
Georgi Gerganov c6315655b7
cont 2026-02-16 10:56:58 +02:00
Georgi Gerganov f762a71d56
grader : improve example answers 2026-02-16 10:51:41 +02:00
Georgi Gerganov 73e61d5b75
rename 2026-02-16 10:30:10 +02:00
Georgi Gerganov cffd268bb3
add gpqa + sampling + docs 2026-02-16 00:52:33 +02:00
Georgi Gerganov e8a807519a
datasets : add gsm8k 2026-02-15 23:19:46 +02:00
Georgi Gerganov 1db8428f00
remove old files 2026-02-15 22:16:54 +02:00
Georgi Gerganov 7751ae2796
docs 2026-02-15 22:15:50 +02:00
Georgi Gerganov d2b10302ce
improve grader 2026-02-15 22:12:02 +02:00
Georgi Gerganov 68dde884d6
minor 2026-02-15 21:21:40 +02:00
Georgi Gerganov fd90796da2
eval : support multiple dataset runs 2026-02-15 21:08:24 +02:00
Georgi Gerganov 8156d549f6
sim : fix answer matching 2026-02-15 21:08:24 +02:00
Georgi Gerganov 9695e6feb4
test : fix path 2026-02-15 21:08:24 +02:00
Georgi Gerganov fb1481d60d
eval : add prompts 2026-02-15 21:08:24 +02:00
Georgi Gerganov 812ae13ec1
eval : print progress 2026-02-15 21:08:24 +02:00
Georgi Gerganov e79e8d02d5
examples: add task summary table to llama-eval-new.py 2026-02-15 21:08:23 +02:00
Georgi Gerganov a939f4c47e
docs: update llama-eval-discussion.md with threading and model parameter updates
- Add threading support implementation details
- Document ThreadPoolExecutor usage and thread safety
- Add model parameter implementation details
- Include testing results for both features
2026-02-15 21:08:23 +02:00
Georgi Gerganov 62b04cef54
examples: add threading support and model parameter to llama-eval-new.py
- Add ThreadPoolExecutor for parallel request processing controlled by --threads
- Add --model argument to specify model name in request data
- Refactor process() to use thread-safe _process_single_case() method
- Update progress tracking to work with concurrent execution
2026-02-15 21:08:23 +02:00
Georgi Gerganov 37b26cafee
docs: update llama-eval-discussion.md with session work summary 2026-02-15 21:08:23 +02:00
Georgi Gerganov 04f6872116
examples: use cached dataset path in simulator to avoid HF Hub requests 2026-02-15 21:08:23 +02:00
Georgi Gerganov c2619c18bf
examples: use cached dataset path to avoid HF Hub requests 2026-02-15 21:08:23 +02:00
Georgi Gerganov 87f8930968
examples: remove HF_HUB_OFFLINE to allow dataset download 2026-02-15 21:08:23 +02:00
Georgi Gerganov 9453f9de12
examples: use HF_HUB_OFFLINE to avoid HF Hub warnings 2026-02-15 21:08:23 +02:00
Georgi Gerganov 5a1be6ce37
examples: implement flexible grader system for answer validation
- Add Grader class supporting regex and CLI-based grading
- Implement built-in regex patterns for AIME, GSM8K, MMLU, HellaSwag, ARC, WinoGrande
- Add CLI grader interface: python script.py --answer <pred> --expected <gold>
- Add HF telemetry disable to avoid warnings
- Support exact match requirement for regex patterns
- Add 30-second timeout for CLI grader
- Handle both boxed and plain text formats for AIME answers
2026-02-15 21:08:23 +02:00
Georgi Gerganov a80814e97b
docs: remove README.md from llama-eval 2026-02-15 21:08:23 +02:00
Georgi Gerganov 5cc2258e82
examples: add simplified llama-eval-new.py for AIME evaluation
- Create new simplified evaluation script focused only on AIME
- Implement EvalState and Processor dataclasses for structured state management
- Add real-time feedback showing correct/incorrect status per case
- Abstract grading interface for external grader support
- Use structured JSON output for eval state
- Apply HuggingFace dataset caching to avoid repeated downloads
- Remove Levenshtein matching - eval script only sends requests and validates answers
2026-02-15 21:08:22 +02:00
Georgi Gerganov c87af1d527
docs: update llama-eval-discussion.md with session work summary
Add summary of llama-server-simulator implementation work including
features, testing results, technical decisions, and refactoring.
2026-02-15 21:08:22 +02:00
Georgi Gerganov 23d4e21a81
examples: refactor test-simulator.sh for better readability
Extract repeating question string into TEST_QUESTION variable and
create make_request() helper function to reduce code duplication.
Add proper error handling for error responses.
2026-02-15 21:08:22 +02:00
Georgi Gerganov 07d5e1e0ea
examples: add llama-server simulator for testing eval scripts
Add a standalone Python script that simulates a llama-server HTTP endpoint
for testing the eval script. The simulator:

- Implements /v1/chat/completions endpoint with OpenAI-compatible format
- Loads AIME dataset from HuggingFace with local caching
- Uses Levenshtein distance for intelligent question matching
- Supports configurable success rate for correct/wrong answer generation
- Provides debug logging for troubleshooting

Also includes test scripts and documentation for testing and understanding
the simulator functionality.
2026-02-15 21:08:22 +02:00
gatbontonpc 8839037528
add checkpointing 2026-02-15 21:08:22 +02:00
gatbontonpc 89cab3dbc5
Add readme 2026-02-15 21:08:22 +02:00
gatbontonpc c2d83ca048
multi source llama-eval 2026-02-15 21:08:22 +02:00
gatbontonpc c05df17ce3
working llama-eval mc and math suite 2026-02-15 21:08:19 +02:00
Daniel Bevenius 6ab881b7c3
model-conversion : add tensor-info.py utility (#18954)
This commit adds a new python script that can be used to print tensors
information from a tensor in a safetensors model.

The motivation for this is that during model conversion work it can
sometimes be useful to verify the shape of tensors in the original
model. While it is possible to print the tensors when loading the model
this can be slow when working with larger models.
With this script it is possible to quickly query tensor shapes.

Example usage:
```console
(venv) $ ./scripts/utils/tensor-info.py --help
usage: tensor-info.py [-h] [-m MODEL_PATH] [-l] [tensor_name]

Print tensor information from a safetensors model

positional arguments:
  tensor_name           Name of the tensor to inspect

options:
  -h, --help            show this help message and exit
  -m MODEL_PATH, --model-path MODEL_PATH
                        Path to the model directory (default: MODEL_PATH environment variable)
  -l, --list            List unique tensor patterns in the model (layer numbers replaced with #)
```

Listing tensor names:
```console
(venv) $ ./scripts/utils/tensor-info.py -m ~/work/ai/models/google/embeddinggemma-300m -l
embed_tokens.weight
layers.#.input_layernorm.weight
layers.#.mlp.down_proj.weight
layers.#.mlp.gate_proj.weight
layers.#.mlp.up_proj.weight
layers.#.post_attention_layernorm.weight
layers.#.post_feedforward_layernorm.weight
layers.#.pre_feedforward_layernorm.weight
layers.#.self_attn.k_norm.weight
layers.#.self_attn.k_proj.weight
layers.#.self_attn.o_proj.weight
layers.#.self_attn.q_norm.weight
layers.#.self_attn.q_proj.weight
layers.#.self_attn.v_proj.weight
norm.weight
```

Printing a specific tensor's information:
```console
(venv) $ ./scripts/utils/tensor-info.py -m ~/work/ai/models/google/embeddinggemma-300m layers.0.input_layernorm.weight
Tensor: layers.0.input_layernorm.weight
File:   model.safetensors
Shape:  [768]
```
2026-02-04 10:40:53 +01:00
Daniel Bevenius 6156ae5111
model-conversion : add debug option to conversion script (#19265)
This commit adds a debug option to the model conversion script to enable
using the Python debugger (pdb) during model conversion.

The motivation for this is that I've found myself adding this a few
times now and it would be quicker to have this flag as an option and a
makefile target/recipe for it.
2026-02-02 11:29:57 +01:00
Christian Kastner 7a4ca3cbd9
docs : Minor cleanups (#19252)
* Update old URLs to github.com/ggml-org/

* Bump copyrights
2026-02-02 08:38:55 +02:00