From 0f17ccdee7e8cd8d1d452f45d2f9d1cc8448276f Mon Sep 17 00:00:00 2001
From: Daniel Bevenius <daniel.bevenius@gmail.com>
Date: Tue, 25 Nov 2025 08:12:42 +0100
Subject: [PATCH] examples : add info about hybrid sampling in batched [no ci]

---
 examples/batched/README.md | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/examples/batched/README.md b/examples/batched/README.md
index de2aa41fba..f10639220e 100644
--- a/examples/batched/README.md
+++ b/examples/batched/README.md
@@ -53,4 +53,17 @@ performed on the backend device, like a GPU.
     --backend_sampling --top-k 80 --backend_dist
 ```
 The `--verbose` flag can be added to see more detailed output and also show
-that the backend samplers are being used.
+that the backend samplers are being used. The above example will perform distribution
+sampling on the backend device and only transfer the sampled token ids back to the host.
+
+It is also possible to perform partial sampling on the backend, and then allow CPU samplers
+to process those results further. This is sometimes referred to as hybrid sampling.
+For an example of this we can remove `--backend_dist` from the above command:
+```bash
+./llama-batched \
+    -m models/Qwen2.5-VL-3B-Instruct-Q8_0.gguf -p "Hello my name is" \
+    -np 4 -kvu \
+    --backend_sampling --top-k 80 -v
+```
+This will perform the top-k filtering on the backend device, and then transfer the filtered logits
+back to the host for sampling.