llama.cpp/common
Pascal f32ca51bfe
server: add presets (config) when using multiple models (#17859)
* llama-server: recursive GGUF loading

Replace flat directory scan with recursive traversal using
std::filesystem::recursive_directory_iterator. Support for
nested vendor/model layouts (e.g. vendor/model/*.gguf).
Model name now reflects the relative path within --models-dir
instead of just the filename. Aggregate files by parent
directory via std::map before constructing local_model

* server : router config POC (INI-based per-model settings)

* server: address review feedback from @aldehir and @ngxson

PEG parser usage improvements:
- Simplify parser instantiation (remove arena indirection)
- Optimize grammar usage (ws instead of zero_or_more, remove optional wrapping)
- Fix last line without newline bug (+ operator instead of <<)
- Remove redundant end position check

Feature scope:
- Remove auto-reload feature (will be separate PR per @ngxson)
- Keep config.ini auto-creation and template generation
- Preserve per-model customization logic

Co-authored-by: aldehir <aldehir@users.noreply.github.com>
Co-authored-by: ngxson <ngxson@users.noreply.github.com>

* server: adopt aldehir's line-oriented PEG parser

Complete rewrite of INI parser grammar and visitor:
- Use p.chars(), p.negate(), p.any() instead of p.until()
- Support end-of-line comments (key=value # comment)
- Handle EOF without trailing newline correctly
- Strict identifier validation ([a-zA-Z_][a-zA-Z0-9_.-]*)
- Simplified visitor (no pending state, no trim needed)
- Grammar handles whitespace natively via eol rule

Business validation preserved:
- Reject section names starting with LLAMA_ARG_*
- Accept only keys starting with LLAMA_ARG_*
- Require explicit section before key-value pairs

Co-authored-by: aldehir <aldehir@users.noreply.github.com>

* server: fix CLI/env duplication in child processes

Children now receive minimal CLI args (executable, model, port, alias)
instead of inheriting all router args. Global settings pass through
LLAMA_ARG_* environment variables only, eliminating duplicate config
warnings.

Fixes: Router args like -ngl, -fa were passed both via CLI and env,
causing 'will be overwritten' warnings on every child spawn

* add common/preset.cpp

* fix compile

* cont

* allow custom-path models

* add falsey check

* server: fix router model discovery and child process spawning

- Sanitize model names: replace / and \ with _ for display
- Recursive directory scan with relative path storage
- Convert relative paths to absolute when spawning children
- Filter router control args from child processes
- Refresh args after port assignment for correct port value
- Fallback preset lookup for compatibility
- Fix missing argv[0]: store server binary path before base_args parsing

* Revert "server: fix router model discovery and child process spawning"

This reverts commit e3832b42eeea7fcb108995966c7584479f745857.

* clarify about "no-" prefix

* correct render_args() to include binary path

* also remove arg LLAMA_ARG_MODELS_PRESET for child

* add co-author for ini parser code

Co-authored-by: aldehir <hello@alde.dev>

* also set LLAMA_ARG_HOST

* add CHILD_ADDR

* Remove dead code

---------

Co-authored-by: aldehir <aldehir@users.noreply.github.com>
Co-authored-by: ngxson <ngxson@users.noreply.github.com>
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
Co-authored-by: aldehir <hello@alde.dev>
2025-12-10 22:18:21 +01:00
..
CMakeLists.txt server: add presets (config) when using multiple models (#17859) 2025-12-10 22:18:21 +01:00
arg.cpp server: add presets (config) when using multiple models (#17859) 2025-12-10 22:18:21 +01:00
arg.h server: add presets (config) when using multiple models (#17859) 2025-12-10 22:18:21 +01:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167) 2025-06-13 10:38:52 +02:00
chat-parser-xml-toolcall.cpp Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser-xml-toolcall.h Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser.cpp Fix Kimi-K2 tool-call parsing issues (#17376) 2025-12-08 14:32:04 +01:00
chat-parser.h common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
chat-peg-parser.cpp common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
chat-peg-parser.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
chat.cpp common : add parser for ministral/mistral large 3/devstral 2 (#17713) 2025-12-09 17:31:04 -06:00
chat.h chat : reserve memory in compute_diffs and improve naming (#17729) 2025-12-03 17:22:10 +02:00
common.cpp common : change --color to accept on/off/auto, default to auto (#17827) 2025-12-07 03:43:50 +01:00
common.h server: add presets (config) when using multiple models (#17859) 2025-12-10 22:18:21 +01:00
console.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
console.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
download.cpp ggml webgpu: add support for emscripten builds (#17184) 2025-12-03 10:25:34 +01:00
download.h server: introduce API for serving / loading / unloading multiple models (#17470) 2025-12-01 19:41:04 +01:00
http.h common: introduce http.h for httplib-based client (#16373) 2025-10-01 20:22:18 +03:00
json-partial.cpp common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
json-partial.h sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
json-schema-to-grammar.cpp Server: Change Invalid Schema from Server Error (500) to User Error (400) (#17572) 2025-12-02 17:33:50 +01:00
json-schema-to-grammar.h common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
llguidance.cpp llguidance : set tokenizer slices to default (#13424) 2025-05-10 17:19:52 +02:00
log.cpp cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
log.h cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
ngram-cache.cpp ggml : portability fixes for VS 2017 (#12150) 2025-03-04 18:53:26 +02:00
ngram-cache.h llama : use LLAMA_TOKEN_NULL (#11062) 2025-01-06 10:52:15 +02:00
peg-parser.cpp common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
peg-parser.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
preset.cpp server: add presets (config) when using multiple models (#17859) 2025-12-10 22:18:21 +01:00
preset.h server: add presets (config) when using multiple models (#17859) 2025-12-10 22:18:21 +01:00
regex-partial.cpp `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
regex-partial.h `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
sampling.cpp common : more accurate sampling timing (#17382) 2025-11-20 13:40:10 +02:00
sampling.h sampling : optimize samplers by reusing bucket sort (#15665) 2025-08-31 20:41:02 +03:00
speculative.cpp sampling : optimize samplers by reusing bucket sort (#15665) 2025-08-31 20:41:02 +03:00
speculative.h server : implement universal assisted decoding (#12635) 2025-07-31 14:25:23 +02:00
unicode.cpp common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00
unicode.h common : introduce composable PEG parser combinators for chat parsing (#17136) 2025-12-03 12:45:32 +02:00