llama.cpp

Commit Graph

Author	SHA1	Message	Date
Sheldon Robinson	3e5746e3a3	Merge branch 'ggml-org:master' into master	2025-12-31 06:54:14 -05:00
Aldehir Rojas	0f89d2ecf1	common : default content to an empty string (#18485 ) * common : default content to an empty string * common : fix tests that break when content != null	2025-12-30 12:00:57 -06:00
Xuan-Son Nguyen	cd78e57c3a	lora: count lora nodes in graph_max_nodes (#18469 ) * lora: count lora nodes in graph_max_nodes * 3 nodes per weight * 4 nodes * keep track n_lora_nodes from llama_model * fix assert * rm redundant header * common: load adapters before context creation * use 6 nodes	2025-12-30 15:53:12 +01:00
o7si	daa242dfc8	common: fix return value check for setpriority (#18412 ) * common: fix return value check for setpriority * tools: add logging for process priority setting	2025-12-29 11:07:49 +02:00
o7si	60f17f56da	rpc: fix segfault on invalid endpoint format (#18387 ) * rpc: fix segfault on invalid endpoint format * rpc: add error log for failed endpoint connection	2025-12-28 12:34:41 +02:00
Johannes Gäßler	026d2ad472	llama: fix magic number of 999 for GPU layers (#18266 ) * llama: fix magic number of 999 for GPU layers * use strings for -ngl, -ngld * enacapsulate n_gpu_layers, split_mode	2025-12-27 20:18:35 +01:00
Xuan-Son Nguyen	f5acfb2ffa	server: (router) add stop-timeout option (#18350 ) * server: (router) add stop-timeout option * also allow stop while loading * add docs * unload_lru: also wait for unload to complete	2025-12-24 23:47:49 +01:00
Sheldon Robinson	4b29e0b0da	Merge branch 'ggml-org:master' into master	2025-12-24 07:19:20 -05:00
ddh0	10355dc7d0	common: add `LLAMA_ARG_OVERRIDE_TENSOR` env var for `-ot` arg (#18267 )	2025-12-24 14:19:12 +08:00
Johannes Gäßler	147a521636	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
Aldehir Rojas	9496bbb808	common : reorganize includes to prioritize vendored deps (#18222 )	2025-12-20 21:43:21 -06:00
Xuan-Son Nguyen	ddcb75dd8a	server: add auto-sleep after N seconds of idle (#18228 ) * implement sleeping at queue level * implement server-context suspend * add test * add docs * optimization: add fast path * make sure to free llama_init * nits * fix use-after-free * allow /models to be accessed during sleeping, fix use-after-free * don't allow accessing /models during sleep, it is not thread-safe * fix data race on accessing props and model_meta * small clean up * trailing whitespace * rm outdated comments	2025-12-21 02:24:42 +01:00
Xuan-Son Nguyen	9e39a1e6a9	server: support load model on startup, support preset-only options (#18206 ) * server: support autoload model, support preset-only options * add docs * load-on-startup * fix * Update common/arg.cpp Co-authored-by: Pascal <admin@serveurperso.com> --------- Co-authored-by: Pascal <admin@serveurperso.com>	2025-12-20 09:25:27 +01:00
Pascal	14931a826e	arg: fix order to use short form before long form (#18196 ) * arg: fix order to use short form before long form * arg: update doc * arg: update test-arg-parser * arg: address review feedback from ngxson simplified to check first.length() <= last.length() only fixed: --sampler-seq, --rerank, --draft ordering note: middle positions in 3+ arg sets are not verified * arg: update doc	2025-12-19 18:01:56 +01:00
Sheldon Robinson	390a505011	Merge branch 'ggml-org:master' into master	2025-12-19 08:50:20 -05:00
Xuan-Son Nguyen	98c1c7a7bf	presets: refactor, allow cascade presets from different sources, add global section (#18169 ) * presets: refactor, allow cascade presets from different sources * update docs * fix neg arg handling * fix empty mmproj * also filter out server-controlled args before to_ini() * skip loading custom_models if not specified * fix unset_reserved_args * fix crash on windows	2025-12-19 12:08:20 +01:00
Xuan-Son Nguyen	8ea958d4d9	model : add ASR support for LFM2-Audio-1.5B (conformer) (#18106 ) * ASR with LFM2-Audio-1.5B * Set rope_theta * Fix comment * Remove rope_theta setting * Address PR feedback * rename functions to conformer * remove some redundant ggml_cont * fix missing tensor * add prefix "a." for conv tensors * remove redundant reshape * clean up * add test model --------- Co-authored-by: Tarek Dakhran <tarek@liquid.ai>	2025-12-19 00:18:01 +01:00
Xuan-Son Nguyen	4d1316c440	arg: fix ASAN error on sampler_type_names empty (#18167 )	2025-12-18 14:30:32 +01:00
Pascal	6ce3d85796	server: (webui) add --webui-config (#18028 ) * server/webui: add server-side WebUI config support Add CLI arguments --webui-config (inline JSON) and --webui-config-file (file path) to configure WebUI default settings from server side. Backend changes: - Parse JSON once in server_context::load_model() for performance - Cache parsed config in webui_settings member (zero overhead on /props) - Add proper error handling in router mode with try/catch - Expose webui_settings in /props endpoint for both router and child modes Frontend changes: - Add 14 configurable WebUI settings via parameter sync - Add tests for webui settings extraction - Fix subpath support with base path in API calls Addresses feedback from @ngxson and @ggerganov * server: address review feedback from ngxson * server: regenerate README with llama-gen-docs	2025-12-17 21:45:45 +01:00
Georgi Gerganov	4301e27319	common : restore grammar-based rejection sampling (#18137 ) * common : restart grammar-based rejection sampling * sampling : allow null samplers	2025-12-17 19:46:00 +02:00
Johannes Gäßler	a2c199e479	common: clarify instructions for bug reports (#18134 )	2025-12-17 18:44:13 +01:00
Pascal	487674fbb3	common: fix --override-kv to support comma-separated values (#18056 ) * common: fix --override-kv to support comma-separated values * Update common/arg.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * common: deprecate repeated arguments, suggest comma-separated values * common: add comma escape support for --override-kv * common: optimize duplicate detection with insert().second Co-authored-by: personalmountains <46615898+personalmountains@users.noreply.github.com> * common: migrate all repeated args to comma-separated syntax --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> Co-authored-by: personalmountains <46615898+personalmountains@users.noreply.github.com>	2025-12-17 11:36:23 +02:00
TrevorS	4b2a4778f8	arg: allow -kvu flag for llama-perplexity (#18117 ) The -kvu (--kv-unified) flag is required for hellaswag and winogrande benchmarks which use coupled sequences. Without unified KV cache, these benchmarks fail with: split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) This change adds LLAMA_EXAMPLE_PERPLEXITY to the allowed examples for the -kvu argument, enabling its use with llama-perplexity.	2025-12-17 08:33:02 +02:00
Xuan-Son Nguyen	7b1db3d3b7	arg: clarify auto kvu/np being set on server (#17997 ) * arg: clarify auto kvu/np being set on server * improve docs * use invalid_argument	2025-12-16 12:01:27 +01:00
Aldehir Rojas	c05aa69f32	common : add nemotron 3 parsing (#18077 ) * common : expose json-schema functionality to extract type info * common : fix peg parser negation during needs_more_input * common : add some defensive measures in constructed peg parser * common : add nemotron nano 3 support * common : add nemotron nano 3 tests * remove debug line	2025-12-16 04:05:23 -06:00
Johannes Gäßler	b1f3a6e5db	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 ) * llama: automatically fit args to free memory llama-fit-params tool * fix CI * hints for bug reports, ensure no reallocation * fix segfault with Vulkan * add llama-fit-params to CI * fix CI * fix CI * fix CI * minor adjustments * fix assignment of 1 dense layer * fix logger not being reset on model load failure * remove --n-gpu-layer hint on model load failure * fix llama-fit-params verbosity * fix edge case * fix typo [no ci]	2025-12-15 09:24:59 +01:00
Xuan-Son Nguyen	52392291b2	preset: handle negated arg, reverse the meaning if needed (#18041 )	2025-12-14 22:08:10 +01:00
Georgi Gerganov	254098a279	common : refactor common_sampler + grammar logic changes (#17937 ) * common : refactor common_sampler + grammar logic changes * tests : increase max_tokens to get needed response * batched : fix uninitialized samplers	2025-12-14 10:11:13 +02:00
Xuan-Son Nguyen	4d5ae24c0a	arg: fix common_params_parse not accepting negated arg (#17991 )	2025-12-13 12:53:37 +01:00
Sigbjørn Skjæret	8e4d678528	common : skip model validation when --completion-bash is requested (#17975 )	2025-12-13 08:40:50 +01:00
Sigbjørn Skjæret	2bc94e7928	add llama-completion to completion-bash executables (#17976 )	2025-12-13 08:35:50 +01:00
Xuan-Son Nguyen	380b4c984e	common: support negated args (#17919 ) * args: support negated args * update docs * fix typo * add more neg options * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * rm duplicated arg * fix LLAMA_ARG_NO_HOST * add test --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-12 23:58:53 +01:00
Xuan-Son Nguyen	54a0fee4b7	arg: add -mm and -mmu as short form of --mmproj and --mmproj-url (#17958 ) * arg: add -mm and -mmu as short form of --mmproj and --mmproj-url * correct order * update docs	2025-12-12 14:06:06 +01:00
Adrien Gallouët	b8ee22cfde	common : add minimalist multi-thread progress bar (#17602 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-12-12 12:44:35 +01:00
Xuan-Son Nguyen	34a6d86982	cli: enable jinja by default (#17911 ) * cli: enable jinja by default * Update common/arg.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-10 22:19:42 +01:00
Pascal	f32ca51bfe	server: add presets (config) when using multiple models (#17859 ) * llama-server: recursive GGUF loading Replace flat directory scan with recursive traversal using std::filesystem::recursive_directory_iterator. Support for nested vendor/model layouts (e.g. vendor/model/.gguf). Model name now reflects the relative path within --models-dir instead of just the filename. Aggregate files by parent directory via std::map before constructing local_model server : router config POC (INI-based per-model settings) * server: address review feedback from @aldehir and @ngxson PEG parser usage improvements: - Simplify parser instantiation (remove arena indirection) - Optimize grammar usage (ws instead of zero_or_more, remove optional wrapping) - Fix last line without newline bug (+ operator instead of <<) - Remove redundant end position check Feature scope: - Remove auto-reload feature (will be separate PR per @ngxson) - Keep config.ini auto-creation and template generation - Preserve per-model customization logic Co-authored-by: aldehir <aldehir@users.noreply.github.com> Co-authored-by: ngxson <ngxson@users.noreply.github.com> * server: adopt aldehir's line-oriented PEG parser Complete rewrite of INI parser grammar and visitor: - Use p.chars(), p.negate(), p.any() instead of p.until() - Support end-of-line comments (key=value # comment) - Handle EOF without trailing newline correctly - Strict identifier validation ([a-zA-Z_][a-zA-Z0-9_.-]) - Simplified visitor (no pending state, no trim needed) - Grammar handles whitespace natively via eol rule Business validation preserved: - Reject section names starting with LLAMA_ARG_ - Accept only keys starting with LLAMA_ARG_* - Require explicit section before key-value pairs Co-authored-by: aldehir <aldehir@users.noreply.github.com> * server: fix CLI/env duplication in child processes Children now receive minimal CLI args (executable, model, port, alias) instead of inheriting all router args. Global settings pass through LLAMA_ARG_* environment variables only, eliminating duplicate config warnings. Fixes: Router args like -ngl, -fa were passed both via CLI and env, causing 'will be overwritten' warnings on every child spawn * add common/preset.cpp * fix compile * cont * allow custom-path models * add falsey check * server: fix router model discovery and child process spawning - Sanitize model names: replace / and \ with _ for display - Recursive directory scan with relative path storage - Convert relative paths to absolute when spawning children - Filter router control args from child processes - Refresh args after port assignment for correct port value - Fallback preset lookup for compatibility - Fix missing argv[0]: store server binary path before base_args parsing * Revert "server: fix router model discovery and child process spawning" This reverts commit e3832b42eeea7fcb108995966c7584479f745857. * clarify about "no-" prefix * correct render_args() to include binary path * also remove arg LLAMA_ARG_MODELS_PRESET for child * add co-author for ini parser code Co-authored-by: aldehir <hello@alde.dev> * also set LLAMA_ARG_HOST * add CHILD_ADDR * Remove dead code --------- Co-authored-by: aldehir <aldehir@users.noreply.github.com> Co-authored-by: ngxson <ngxson@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: aldehir <hello@alde.dev>	2025-12-10 22:18:21 +01:00
Xuan-Son Nguyen	6c2131773c	cli: new CLI experience (#17824 ) * wip * wip * fix logging, add display info * handle commands * add args * wip * move old cli to llama-completion * rm deprecation notice * move server to a shared library * move ci to llama-completion * add loading animation * add --show-timings arg * add /read command, improve LOG_ERR * add args for speculative decoding, enable show timings by default * add arg --image and --audio * fix windows build * support reasoning_content * fix llama2c workflow * color default is auto * fix merge conflicts * properly fix color problem Co-authored-by: bandoti <bandoti@users.noreply.github.com> * better loading spinner * make sure to clean color on force-exit * also clear input files on "/clear" * simplify common_log_flush * add warning in mtmd-cli * implement console writter * fix data race * add attribute * fix llama-completion and mtmd-cli * add some notes about console::log * fix compilation --------- Co-authored-by: bandoti <bandoti@users.noreply.github.com>	2025-12-10 15:28:59 +01:00
Aldehir Rojas	2fbe3b7bb7	common : add parser for ministral/mistral large 3/devstral 2 (#17713 )	2025-12-09 17:31:04 -06:00
Xuan-Son Nguyen	4e842d5120	console: allow using arrow left/right, home/end keys and history mode (#17836 ) * console: allow using arrow left/right to edit the line (with UTF-8 support) * console: fix arrow keys on Windows using private-use Unicode * console: add Home/End key support for Windows and Linux * console: add basic Up/Down history navigation * fix build * console: allow using arrow left/right to edit the line (with UTF-8 support) * console: fix arrow keys on Windows using private-use Unicode * console: add Home/End key support for Windows and Linux * console: add basic Up/Down history navigation * console: remove unreachable wc == 0 check after VK switch * console: add Ctrl+Left/Right word navigation - Add KEY_CTRL_ARROW_LEFT and KEY_CTRL_ARROW_RIGHT codes - Windows: detect CTRL modifier via dwControlKeyState - Linux: parse ANSI sequences with modifier (1;5D/C) - Implement move_word_left/right with space-skipping logic - Refactor escape sequence parsing to accumulate params * console: add Delete key support - Windows: VK_DELETE detection - Linux: ESC[3~ sequence parsing - Forward character deletion with UTF-8 support * console: implement bash-style history editing - Edit any history line during UP/DOWN navigation, edits persist - Pressing Enter appends edited version as new history entry - Original line stay untouched in their positions * clean up * better history impl * fix decode_utf8 --------- Co-authored-by: Pascal <admin@serveurperso.com>	2025-12-09 11:53:59 +01:00
hksdpc255	636fc17a37	Fix Kimi-K2 tool-call parsing issues (#17376 ) * Fix kimi-k2 parsing * fix template & add more tests for kimi-k2 * Another fix for Kimi-K2 chat template. * enable allow_toolcall_in_think for Kimi-K2 * Refine key-value separator and value end format * Enable tool call in think for kimi-k2 * allow_toolcall_in_think is now tested with Kimi-K2 * Remove outdated TODO comment in XML tool call parser Removed TODO comment about untested tool call feature. * Rename function from "utf8_truncate_safe" to "utf8_truncate_safe_len"	2025-12-08 14:32:04 +01:00
Sigbjørn Skjæret	22577583a3	common : change --color to accept on/off/auto, default to auto (#17827 )	2025-12-07 03:43:50 +01:00
Sheldon Robinson	dd4c0b4788	Add peg-parser.h include to chat-parser.h	2025-12-04 20:10:45 -05:00
Sheldon Robinson	a2853260ff	Update chat-parser.cpp	2025-12-04 19:17:53 -05:00
Sheldon Robinson	f08a068206	Remove common_chat_parse function Removed the common_chat_parse function and its implementation.	2025-12-04 19:10:58 -05:00
Sheldon Robinson	98015388de	Remove deprecated chat parse functions Removed deprecated chat parsing functions.	2025-12-04 19:05:43 -05:00
Sheldon Robinson	f8d3806162	Add new parsing functions for chat messages	2025-12-04 19:05:28 -05:00
Sheldon Robinson	82732fe002	Remove unused common_chat_parse function declaration	2025-12-04 19:02:12 -05:00
Sheldon Robinson	2b93ea3aa3	Add common_chat_parse function declaration Fixes #17771	2025-12-04 18:56:12 -05:00
Sheldon Robinson	ce174073ba	Add common_chat_parse function for message parsing Fixes #17771	2025-12-04 18:05:59 -05:00
Daniel Bevenius	bd4ef13476	common : skip model validation when --help is requested (#17755 ) This commit skips the model validation check when the user specifies the --help option. The motivation for this is that currently and error is thrown before the --help could be processed. Now skips validation if params.usage is set, allowing help to display without requiring --model. Resolves: https://github.com/ggml-org/llama.cpp/issues/17754	2025-12-04 13:36:50 +01:00

1 2 3 4 5 ...

679 Commits