llama.cpp/examples/batched/README.md

1.5 KiB

llama.cpp/example/batched

The example demonstrates batched generation from a given prompt

./batched ./models/llama-7b-v2/llama-2-7b-chat.Q8_0.gguf "Hello my name is" 4

...

main: n_len = 32, n_ctx = 128, n_batch = 32, n_parallel = 4, n_kv_req = 113, n_threads = 16, n_threads_batch = 16

<s> Hello my name is

main: generating 4 sequences ...

main: stream 0 finished at n_cur = 32
main: stream 1 finished at n_cur = 32
main: stream 2 finished at n_cur = 32
main: stream 3 finished at n_cur = 32

sequence 0:

Hello my name is [Your Name], and I am a [Your Profession] with [Number of Years] of experience in the [Your Industry

sequence 1:

Hello my name is Drew and I am a 31 year old man from the United States. I have been a fan of anime for as

sequence 2:

Hello my name is Tiffany and I am a 30 year old female. I have been experiencing some symptoms that I am concerned about

sequence 3:

Hello my name is John and I am a 26 year old man from the United States. I have been experiencing some strange symptoms over the

main: decoded 108 tokens in 4.19 s, speed: 25.76 t/s

llama_print_timings:        load time =     549.60 ms
llama_print_timings:      sample time =       4.14 ms /   112 runs   (    0.04 ms per token, 27027.03 tokens per second)
llama_print_timings: prompt eval time =    4333.64 ms /   113 tokens (   38.35 ms per token,    26.08 tokens per second)
llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_print_timings:       total time =    4742.24 ms /   114 tokens