examples : add info about hybrid sampling in batched [no ci]

2025-11-25 08:12:42 +01:00 · 2025-11-25 08:12:42 +01:00 · 0f17ccdee7
parent 2b4c7927ee
commit 0f17ccdee7
1 changed files with 14 additions and 1 deletions
--- a/examples/batched/README.md
+++ b/examples/batched/README.md
@ -53,4 +53,17 @@ performed on the backend device, like a GPU.
    --backend_sampling --top-k 80 --backend_dist
 ```
 The `--verbose` flag can be added to see more detailed output and also show
-that the backend samplers are being used.
+that the backend samplers are being used. The above example will perform distribution
 sampling on the backend device and only transfer the sampled token ids back to the host.
 It is also possible to perform partial sampling on the backend, and then allow CPU samplers
 to process those results further. This is sometimes referred to as hybrid sampling.
 For an example of this we can remove `--backend_dist` from the above command:
 ```bash
 ./llama-batched \
    -m models/Qwen2.5-VL-3B-Instruct-Q8_0.gguf -p "Hello my name is" \
    -np 4 -kvu \
    --backend_sampling --top-k 80 -v
 ```
 This will perform the top-k filtering on the backend device, and then transfer the filtered logits
 back to the host for sampling.