llama.cpp/examples/parallel
SamareshSingh cb8f4fa3f8
Fix locale-dependent float printing in GGUF metadata (#17331)
* Set C locale for consistent float formatting across all binaries.

* Add C locale setting to all tools binaries

Add std::setlocale(LC_NUMERIC, "C") to all 16 binaries in the tools/
directory to ensure consistent floating-point formatting.

* Apply suggestion from @JohannesGaessler

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-04 09:30:40 +01:00
..
CMakeLists.txt ggml : move AMX to the CPU backend (#10570) 2024-11-29 21:54:58 +01:00
README.md parallel : increase the variability of the prompt lengths (#13927) 2025-05-30 19:38:07 +03:00
parallel.cpp Fix locale-dependent float printing in GGUF metadata (#17331) 2026-03-04 09:30:40 +01:00

README.md

llama.cpp/example/parallel

Simplified simulation of serving incoming requests in parallel

Example

Generate 128 client requests (-ns 128), simulating 8 concurrent clients (-np 8). The system prompt is shared (-pps), meaning that it is computed once at the start. The client requests consist of up to 10 junk questions (--junk 10) followed by the actual question.

llama-parallel -m model.gguf -np 8 -ns 128 --top-k 1 -pps --junk 10 -c 16384

[!NOTE] It's recommended to use base models with this example. Instruction tuned models might not be able to properly follow the custom chat template specified here, so the results might not be as expected.