llama.cpp

History

Jianhui Zhou 5714d4b86e ggml: Add thread count control during repacking This change enables the repack stage to utilize the user-specified thread count, ensuring that both the logical thread IDs and the total number of threads remain consistent between the repack and inference stages. In a NUMA architecture where the `--numa distribute` parameter is used, logical threads are pinned to specific physical NUMA nodes. By aligning the thread configuration across these two stages, we can fully leverage the operating system's "first-touch" memory allocation policy: 1. Repack Stage: Logical thread i (bound to NUMA node j) is responsible for repacking and writing the weight data. Since the "first touch" occurs within this thread, the corresponding physical memory is allocated on node j. 2. Inference Stage: The same logical thread i (still bound to node j) reads these weights. Since the data already resides on the local node, low-latency local memory access is achieved. Without ensuring consistency in the number of threads, data may be randomly allocated to mismatched nodes, resulting in significant cross-node access overhead during inference. Signed-off-by: Jianhui Zhou <jonaszhou@zhaoxin.com>		2026-01-13 07:36:31 +00:00
..
CMakeLists.txt	server: add presets (config) when using multiple models (#17859 )	2025-12-10 22:18:21 +01:00
arg.cpp	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
arg.h	arg: fix common_params_parse not accepting negated arg (#17991 )	2025-12-13 12:53:37 +01:00
base64.hpp	llava : expose as a shared library for downstream projects (#3613 )	2023-11-07 00:36:23 +03:00
build-info.cpp.in	cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167 )	2025-06-13 10:38:52 +02:00
chat-parser-xml-toolcall.cpp	Fix Kimi-K2 tool-call parsing issues (#17376 )	2025-12-08 14:32:04 +01:00
chat-parser-xml-toolcall.h	Fix Kimi-K2 tool-call parsing issues (#17376 )	2025-12-08 14:32:04 +01:00
chat-parser.cpp	Fix Kimi-K2 tool-call parsing issues (#17376 )	2025-12-08 14:32:04 +01:00
chat-parser.h	common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )	2025-11-18 18:54:15 +01:00
chat-peg-parser.cpp	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
chat-peg-parser.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
chat.cpp	common : add parser for ministral/mistral large 3/devstral 2 (#17713 )	2025-12-09 17:31:04 -06:00
chat.h	chat : reserve memory in compute_diffs and improve naming (#17729 )	2025-12-03 17:22:10 +02:00
common.cpp	ggml: Add thread count control during repacking	2026-01-13 07:36:31 +00:00
common.h	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
console.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
console.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
download.cpp	common : add minimalist multi-thread progress bar (#17602 )	2025-12-12 12:44:35 +01:00
download.h	server: introduce API for serving / loading / unloading multiple models (#17470 )	2025-12-01 19:41:04 +01:00
http.h	common: introduce http.h for httplib-based client (#16373 )	2025-10-01 20:22:18 +03:00
json-partial.cpp	common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )	2025-11-18 18:54:15 +01:00
json-partial.h	sync : vendor (#13901 )	2025-05-30 16:25:45 +03:00
json-schema-to-grammar.cpp	Server: Change Invalid Schema from Server Error (500) to User Error (400) (#17572 )	2025-12-02 17:33:50 +01:00
json-schema-to-grammar.h	common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932 )	2025-11-18 18:54:15 +01:00
llguidance.cpp	llguidance : set tokenizer slices to default (#13424 )	2025-05-10 17:19:52 +02:00
log.cpp	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
log.h	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
ngram-cache.cpp	ggml : portability fixes for VS 2017 (#12150 )	2025-03-04 18:53:26 +02:00
ngram-cache.h	llama : use LLAMA_TOKEN_NULL (#11062 )	2025-01-06 10:52:15 +02:00
peg-parser.cpp	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
peg-parser.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
preset.cpp	preset: handle negated arg, reverse the meaning if needed (#18041 )	2025-12-14 22:08:10 +01:00
preset.h	server: add presets (config) when using multiple models (#17859 )	2025-12-10 22:18:21 +01:00
regex-partial.cpp	`common`: add partial regex support (#12808 )	2025-05-14 19:50:57 +01:00
regex-partial.h	`common`: add partial regex support (#12808 )	2025-05-14 19:50:57 +01:00
sampling.cpp	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
sampling.h	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
speculative.cpp	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
speculative.h	server : implement universal assisted decoding (#12635 )	2025-07-31 14:25:23 +02:00
unicode.cpp	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00
unicode.h	common : introduce composable PEG parser combinators for chat parsing (#17136 )	2025-12-03 12:45:32 +02:00