Update README.md

2025-08-09 14:49:44 +01:00 · 2025-08-09 14:49:44 +01:00 · 89051cda35
parent dcac206f8e
commit 89051cda35
1 changed files with 8 additions and 2 deletions
--- a/tools/imatrix/README.md
+++ b/tools/imatrix/README.md
@ -10,7 +10,7 @@ More information is available in <https://github.com/ggml-org/llama.cpp/pull/486
    -m model.gguf -f some-text.txt [-o imatrix.gguf] [--output-format {gguf,dat}] [--no-ppl] \
    [--process-output] [--chunk 123] [--save-frequency 0] [--output-frequency 10] \
    [--in-file imatrix-prev-0.gguf --in-file imatrix-prev-1.gguf ...] [--parse-special] \
-    [--show-statistics] [...]
+    [--activation-statistics] [--show-statistics] [...]
 ```

 Here `-m | --model` with a model name and `-f | --file` with a file containing calibration data (such as e.g. `wiki.train.raw`) are mandatory.
@ -29,6 +29,7 @@ The parameters in square brackets are optional and have the following meaning:
 * `--chunks` maximum number of chunks to process. Default is `-1` for all available chunks.
 * `--no-ppl` disables the calculation of perplexity for the processed chunks. Useful if you want to speed up the processing and do not care about perplexity.
 * `--show-statistics` displays imatrix file's statistics.
+* `--activation-statistics` enables the collection of activation statistics for each tensor. If set, the imatrix file size will double, but reported statistics will be more accurate.

 For faster computation, make sure to use GPU offloading via the `-ngl | --n-gpu-layers` argument.

@ -69,6 +70,11 @@ Versions **b5942** and newer of `llama-imatrix` store data in GGUF format by def
 ./llama-imatrix -m ggml-model-f16.gguf -f calibration-data.txt --chunk 5 --output-frequency 20 --save-frequency 50 --parse-special
 ```

+```bash
+# generate imatrix and enable activation-based statistics
+./llama-imatrix -m ggml-model-f16.gguf -f calibration-data.txt --activation-statistics -ngl 99
+```
+
 ```bash
 # analyse imatrix file and display summary statistics instead of running inference
 ./llama-imatrix --in-file imatrix.gguf --show-statistics
@ -76,7 +82,7 @@ Versions **b5942** and newer of `llama-imatrix` store data in GGUF format by def

 ## Statistics

-From version <bwxyz>, `--show-statistics` operates in two modes: for GGUF (preferred) imatrices, it reports direct and accurate activation statistics, and for legacy (binary) files, it reports the less precise average squared activations.
+Beginning with version <bwxyz>, `--show-statistics` has two modes. If `--activation-statistics` was used at imatrix creation time and `--output-format` was set to `gguf`, it reports precise statistics. Otherwise, it reports less accurate, albeit still useful, metrics based on average squared activations. 

 #### Per tensor