llama.cpp/common
Jianhui Zhou 5714d4b86e ggml: Add thread count control during repacking
This change enables the repack stage to utilize the user-specified
thread count, ensuring that both the logical thread IDs and the total
number of threads remain consistent between the repack and inference
stages.

In a NUMA architecture where the `--numa distribute` parameter is used,
logical threads are pinned to specific physical NUMA nodes. By aligning
the thread configuration across these two stages, we can fully leverage
the operating system's "first-touch" memory allocation policy:

1. Repack Stage: Logical thread i (bound to NUMA node j) is responsible
   for repacking and writing the weight data. Since the "first touch"
   occurs within this thread, the corresponding physical memory is
   allocated on node j.

2. Inference Stage: The same logical thread i (still bound to node j)
   reads these weights. Since the data already resides on the local
   node, low-latency local memory access is achieved.

Without ensuring consistency in the number of threads, data may be
randomly allocated to mismatched nodes, resulting in significant
cross-node access overhead during inference.

Signed-off-by: Jianhui Zhou <jonaszhou@zhaoxin.com>
2026-01-13 07:36:31 +00:00
..
CMakeLists.txt server: add presets (config) when using multiple models (#17859) 2025-12-10 22:18:21 +01:00
arg.cpp common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
arg.h arg: fix common_params_parse not accepting negated arg (#17991) 2025-12-13 12:53:37 +01:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) 2025-06-13 10:38:52 +02:00
chat-parser-xml-toolcall.cpp Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser-xml-toolcall.h Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser.cpp Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser.h common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
chat-peg-parser.cpp common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
chat-peg-parser.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
chat.cpp common : add parser for ministral/mistral large 3/devstral 2 (#17713) 2025-12-09 17:31:04 -06:00
chat.h chat : reserve memory in compute_diffs and improve naming (#17729) 2025-12-03 17:22:10 +02:00
common.cpp ggml: Add thread count control during repacking 2026-01-13 07:36:31 +00:00
common.h common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
console.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
console.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
download.cpp common : add minimalist multi-thread progress bar (#17602) 2025-12-12 12:44:35 +01:00
download.h server: introduce API for serving / loading / unloading multiple models (#17470) 2025-12-01 19:41:04 +01:00
http.h common: introduce http.h for httplib-based client (#16373) 2025-10-01 20:22:18 +03:00
json-partial.cpp common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
json-partial.h sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
json-schema-to-grammar.cpp Server: Change Invalid Schema from Server Error (500) to User Error (400) (#17572) 2025-12-02 17:33:50 +01:00
json-schema-to-grammar.h common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
llguidance.cpp llguidance : set tokenizer slices to default (#13424) 2025-05-10 17:19:52 +02:00
log.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
log.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
ngram-cache.cpp ggml : portability fixes for VS 2017 (#12150) 2025-03-04 18:53:26 +02:00
ngram-cache.h llama : use LLAMA_TOKEN_NULL (#11062) 2025-01-06 10:52:15 +02:00
peg-parser.cpp common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
peg-parser.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
preset.cpp preset: handle negated arg, reverse the meaning if needed (#18041) 2025-12-14 22:08:10 +01:00
preset.h server: add presets (config) when using multiple models (#17859) 2025-12-10 22:18:21 +01:00
regex-partial.cpp `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
regex-partial.h `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
sampling.cpp common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
sampling.h common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
speculative.cpp common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
speculative.h server : implement universal assisted decoding (#12635) 2025-07-31 14:25:23 +02:00
unicode.cpp common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
unicode.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00