llama.cpp/tools
Jianhui Zhou 5714d4b86e ggml: Add thread count control during repacking
This change enables the repack stage to utilize the user-specified
thread count, ensuring that both the logical thread IDs and the total
number of threads remain consistent between the repack and inference
stages.

In a NUMA architecture where the `--numa distribute` parameter is used,
logical threads are pinned to specific physical NUMA nodes. By aligning
the thread configuration across these two stages, we can fully leverage
the operating system's "first-touch" memory allocation policy:

1. Repack Stage: Logical thread i (bound to NUMA node j) is responsible
   for repacking and writing the weight data. Since the "first touch"
   occurs within this thread, the corresponding physical memory is
   allocated on node j.

2. Inference Stage: The same logical thread i (still bound to node j)
   reads these weights. Since the data already resides on the local
   node, low-latency local memory access is achieved.

Without ensuring consistency in the number of threads, data may be
randomly allocated to mismatched nodes, resulting in significant
cross-node access overhead during inference.

Signed-off-by: Jianhui Zhou <jonaszhou@zhaoxin.com>
2026-01-13 07:36:31 +00:00
..
batched-bench batched-bench : add "separate text gen" mode (#17103) 2025-11-10 12:59:29 +02:00
cli cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
completion common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
cvector-generator common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
export-lora cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
gguf-split cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
imatrix common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
llama-bench ggml: Add thread count control during repacking 2026-01-13 07:36:31 +00:00
mtmd mtmd: enhance image resizing in llava_uhd (#18014) 2025-12-14 15:57:52 +01:00
perplexity common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
quantize cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
rpc Install rpc-server when GGML_RPC is ON. (#17149) 2025-11-11 10:53:59 +00:00
run Manually link -lbsd to resolve flock symbol on AIX (#16610) 2025-10-23 19:37:31 +08:00
server common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
tokenize cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
tts common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
CMakeLists.txt cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00