llama.cpp

History

Jianhui Zhou 5714d4b86e ggml: Add thread count control during repacking This change enables the repack stage to utilize the user-specified thread count, ensuring that both the logical thread IDs and the total number of threads remain consistent between the repack and inference stages. In a NUMA architecture where the `--numa distribute` parameter is used, logical threads are pinned to specific physical NUMA nodes. By aligning the thread configuration across these two stages, we can fully leverage the operating system's "first-touch" memory allocation policy: 1. Repack Stage: Logical thread i (bound to NUMA node j) is responsible for repacking and writing the weight data. Since the "first touch" occurs within this thread, the corresponding physical memory is allocated on node j. 2. Inference Stage: The same logical thread i (still bound to node j) reads these weights. Since the data already resides on the local node, low-latency local memory access is achieved. Without ensuring consistency in the number of threads, data may be randomly allocated to mismatched nodes, resulting in significant cross-node access overhead during inference. Signed-off-by: Jianhui Zhou <jonaszhou@zhaoxin.com>		2026-01-13 07:36:31 +00:00
..
batched-bench	batched-bench : add "separate text gen" mode (#17103 )	2025-11-10 12:59:29 +02:00
cli	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
completion	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
cvector-generator	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
export-lora	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
gguf-split	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
imatrix	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
llama-bench	ggml: Add thread count control during repacking	2026-01-13 07:36:31 +00:00
mtmd	mtmd: enhance image resizing in llava_uhd (#18014 )	2025-12-14 15:57:52 +01:00
perplexity	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
quantize	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
rpc	Install rpc-server when GGML_RPC is ON. (#17149 )	2025-11-11 10:53:59 +00:00
run	Manually link -lbsd to resolve flock symbol on AIX (#16610 )	2025-10-23 19:37:31 +08:00
server	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
tokenize	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
tts	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
CMakeLists.txt	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00