llama.cpp

Commit Graph

Author	SHA1	Message	Date
AN Long	48cda24c11	server: remove the verbose_prompt parameter (#21059 ) * server: respect the verbose_prompt parameter * Revert "server: respect the verbose_prompt parameter" This reverts commit `8ed885cf37`. * Remove --verbose-prompt parameter from llama-server * Using set_examples instead of set_excludes	2026-03-27 13:36:13 +02:00
Xuan-Son Nguyen	20197b6fe3	server: add built-in tools backend support (#20898 ) * wip: server_tools * refactor * displayName -> display_name * snake_case everywhere * rm redundant field * change arg to --tools all * add readme mention * llama-gen-docs	2026-03-27 10:07:11 +01:00
Adrien Gallouët	287b5b1eab	common : add getpwuid fallback for HF cache when HOME is not set (#21035 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-26 20:34:23 +01:00
Adrien Gallouët	9900b29c3a	common : filter out imatrix when finding models (#21023 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-26 15:37:18 +01:00
Adrien Gallouët	93dfbc1291	common : make LLAMA_CACHE the one cache for everything (#21009 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-26 12:04:57 +01:00
Adrien Gallouët	3cba8bba18	common : fix split model migration (#21019 ) Sadly the manifest does not list all required files, i honestly thought it was the case Without the files listed we don't have the sha256, so if the first file is valid, and all others have the correct size, then we can assume we are good and do the migration... Here my test: $ find /home/angt/.cache/llama.cpp /home/angt/.cache/llama.cpp /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00002-of-00002.gguf /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00001-of-00002.gguf /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00001-of-00002.gguf.etag /home/angt/.cache/llama.cpp/angt_test-split-model-stories260K_stories260K-f32-00002-of-00002.gguf.etag /home/angt/.cache/llama.cpp/manifest=angt=test-split-model-stories260K=latest.json $ build/bin/llama-server ================================================================================ WARNING: Migrating cache to HuggingFace cache directory Old cache: /home/angt/.cache/llama.cpp/ New cache: /home/angt/.cache/huggingface/hub This one-time migration moves models previously downloaded with -hf from the legacy llama.cpp cache to the standard HuggingFace cache. Models downloaded with --model-url are not affected. ================================================================================ migrate_file: migrated angt_test-split-model-stories260K_stories260K-f32-00001-of-00002.gguf -> /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00001-of-00002.gguf migrate_file: migrated angt_test-split-model-stories260K_stories260K-f32-00002-of-00002.gguf -> /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00002-of-00002.gguf migrate_old_cache_to_hf_cache: migration complete, deleting manifest: /home/angt/.cache/llama.cpp/manifest=angt=test-split-model-stories260K=latest.json $ find /home/angt/.cache/llama.cpp /home/angt/.cache/huggingface /home/angt/.cache/llama.cpp /home/angt/.cache/huggingface /home/angt/.cache/huggingface/hub /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/blobs /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/blobs/50d019817c2626eb9e8a41f361ff5bfa538757e6f708a3076cd3356354a75694 /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/blobs/7b273e1dbfab11dc67dce479deb5923fef27c39cbf56a20b3a928a47b77dab3c /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/refs /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/refs/main /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00002-of-00002.gguf /home/angt/.cache/huggingface/hub/models--angt--test-split-model-stories260K/snapshots/68c3ea2061e8c7688455fab07597dde0f4d7f0db/stories260K-f32-00001-of-00002.gguf Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-26 12:04:37 +01:00
Adrien Gallouët	c0159f9c1f	common : do not delete old files from the old cache when updating (#21000 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-25 22:28:04 +01:00
Adrien Gallouët	056b50c319	common : fix verbosity setup (#20989 ) The verbosity threshold was set at the end of common_params_parse_ex(), after doing many things (like downloading files..) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-25 19:41:01 +01:00
Adrien Gallouët	f2c72b8f1f	common : fix gguf selection in common_list_cached_models (#20996 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-25 19:18:06 +01:00
Xuan-Son Nguyen	914eb5ff0c	jinja: fix macro with kwargs (#20960 ) * jinja: fix macro with kwargs * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix newline problem --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-25 12:22:48 +01:00
Adrien Gallouët	42ebce3beb	common : fix get_gguf_split_info (#20946 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-24 13:33:14 +01:00
Adrien Gallouët	2d2d9c2062	common : add a WARNING for HF cache migration (#20935 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-24 09:24:39 +01:00
Adrien Gallouët	8c7957ca33	common : add standard Hugging Face cache support (#20775 ) * common : add standard Hugging Face cache support - Use HF API to find all files - Migrate all manifests to hugging face cache at startup Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Check with the quant tag Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Cleanup Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Improve error handling and report API errors Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Restore common_cached_model_info and align mmproj filtering Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Prefer main when getting cached ref Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use cached files when HF API fails Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Use final_path.. Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Check all inputs Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-24 07:30:33 +01:00
Aldehir Rojas	312d870a89	common : replace wrap_for_generation with a prefix convenience function and fix gpt-oss (#20912 )	2026-03-23 22:21:47 -05:00
Jhen-Jie Hong	7a0b6a635e	common/autoparser : detect reasoning markers when enable_thinking changes system prompt (#20859 )	2026-03-23 08:35:27 +01:00
Sigbjørn Skjæret	23c9182ce8	jinja : refactor token advancement (#20864 ) * refactor token advancement * exercise sub-expressions	2026-03-22 17:45:10 +01:00
ddh0	3306dbaef7	misc : prefer ggml-org models in docs and examples (#20827 ) * misc : prefer ggml-org models in docs and examples Prefer referring to known-good quantizations under ggml-org rather than 3rd-party uploaders. * remove accidentally committed file	2026-03-21 22:00:26 +01:00
Piotr Wilkin (ilintar)	b1c70e2e54	common/parser: fix nasty bug causing subtle corruption of generation prompt (#20825 )	2026-03-21 00:19:04 +01:00
James O'Leary	149b2493c0	common : fix typo in debug log ('extracft' -> 'extract') (#20807 )	2026-03-20 18:23:18 +01:00
Ruikai Peng	21c8045214	jinja : fix heap OOB read in value equality comparison (#20782 ) Address GHSA-q9j6-4hhc-rq9p and GHSA-2q4c-9gq5-5vfp. The three-iterator overload of std::equal in value_array_t::equivalent() and value_object_t::equivalent() reads past the end of the shorter container when comparing arrays or objects of different lengths. Use the four-iterator overload (C++14) which checks both range lengths. Found-by: Pwno	2026-03-20 07:15:17 +01:00
James O'Leary	c46583b86b	common/parser : fix out_of_range crash in throw path (#20424 regression) (#20777 ) * chat : fix out_of_range crash in throw path (#20424 regression) #20424 introduced effective_input = generation_prompt + input, but the throw path uses input.substr(result.end) where result.end is a position within effective_input. Every thinking model with a non-empty generation_prompt crashes with std::out_of_range instead of the intended error message. Test crashes on unpatched master, passes with fix: cmake -B build -DLLAMA_BUILD_TESTS=ON -DLLAMA_BUILD_TOOLS=OFF cmake --build build --target test-chat ./build/bin/test-chat * Update test-chat.cpp * Update test-chat.cpp * Update test-chat.cpp --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>	2026-03-20 02:37:22 +01:00
James O'Leary	76f2dc70c3	chat : handle tool calls with no required args in TAG_WITH_TAGGED format (#20764 ) * chat : handle tool calls with no required args in TAG_WITH_TAGGED format * Update tests/test-chat.cpp [no ci] Co-authored-by: Aldehir Rojas <hello@alde.dev> --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com> Co-authored-by: Aldehir Rojas <hello@alde.dev>	2026-03-19 17:53:11 +01:00
Piotr Wilkin (ilintar)	5e54d51b19	common/parser: add proper reasoning tag prefill reading (#20424 ) * Implement proper prefill extraction * Refactor cli parameters, update docs, move reasoning budget sampler part to common/reasoning-budget.cpp * Update tools/server/server-task.cpp * refactor: move grammars to variant, remove grammar_external, handle exception internally * Make code less C++y Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-03-19 16:58:21 +01:00
ddh0	922b90e567	common : add LLAMA_ARG_SPEC_TYPE (#20744 )	2026-03-19 16:16:55 +01:00
Aldehir Rojas	1b9bbaa357	common : fix gpt-oss content removal (#20745 )	2026-03-19 11:40:39 +01:00
Pop Flamingo	312cf03328	llama : re-enable manual LoRA adapter free (#19983 ) * Re-enable manual LoRA adapter free * Remove stale "all adapters must be loaded before context creation" stale comments	2026-03-18 12:03:26 +02:00
Aldehir Rojas	5e8910a0db	common : rework gpt-oss parser (#20393 ) * common : rework gpt-oss parser * cont : fix gpt-oss tests * cont : add structured output test * cont : rename final to final_msg	2026-03-18 10:41:25 +01:00
Piotr Wilkin (ilintar)	d2ecd2d1cf	common/parser: add `--skip-chat-parsing` to force a pure content parser. (#20289 ) * Add `--force-pure-content` to force a pure content parser. * Update common/arg.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Change parameter name [no ci] --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-17 16:16:43 +01:00
Aldehir Rojas	1bbec6a75d	jinja : add capability check for object args (#20612 )	2026-03-16 17:43:14 +01:00
Masato Nakasaka	d3936498a3	common : fix iterator::end() dereference (#20445 )	2026-03-16 08:50:38 +02:00
Eric Hsieh	559646472d	fix: prevent nullptr dereference (#20552 )	2026-03-15 16:51:49 +01:00
Piotr Wilkin (ilintar)	1430c35948	common/parser: gracefully handle undetected tool parser, print error message. (#20286 )	2026-03-13 20:56:10 +01:00
Ruben Ortlam	128142fe7d	test-backend-ops: allow loading tests from file and parsing model operators into file (#19896 ) * tests: allow loading test-backend-ops tests from json * add error threshold based on op * add error when file cannot be read * add graph operator json extraction tool * add nb parameter for non-contiguous input tensors * fix view check * only use view if non-contiguous/permuted, use C++ random instead of rand() * replace internal API calls with public llama_graph_reserve call * reduce test description length * fix nb[0] not getting set for view * add name to tests * fix inplace error * use text file instead of json * move llama_graph_reserve function to new llama-ext header, move export-graph-ops to tests/ * fix missing declaration * use pragma once * fix indent * fix Windows build	2026-03-12 13:26:00 +01:00
Daniel Bevenius	6de1bc631d	common : update completion executables list [no ci] (#19934 ) This commit updates the bash completion executables list, adding missing executables and removing some that non longer exist.	2026-03-12 12:12:01 +01:00
Mishusha	a8304b4d27	common/parser: add GigaChatV3/3.1 models support (#19931 ) Co-authored-by: Mishusha <pmv26021975@gmail.com>	2026-03-12 01:22:25 +01:00
ddh0	4a748b8f15	common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (#20416 )	2026-03-12 00:13:28 +01:00
Aldehir Rojas	b5fe4559ae	common/parser: use nlohmann::ordered_json to preserve parameter order (#20385 )	2026-03-11 10:26:51 +01:00
Piotr Wilkin (ilintar)	acb7c79069	common/parser: handle reasoning budget (#20297 ) * v1 * Finished! * Handlie cli * Reasoning sampler * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Less explosive terminology :) * Add utf-8 case and tests * common : migrate reasoning budget sampler to common * cont : clean up * cont : expose state and allow passing as initial state * cont : remove unused imports * cont : update state machine doc string --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Alde Rojas <hello@alde.dev>	2026-03-11 10:26:12 +01:00
Piotr Wilkin (ilintar)	6c770d16ca	Reduce level of content parser warning message to avoid log spam on non-debug verbosity (#20347 )	2026-03-10 15:21:51 +01:00
Sigbjørn Skjæret	ec947d2b16	common : fix incorrect uses of stoul (#20313 )	2026-03-10 11:40:26 +01:00
Aldehir Rojas	c96f608d98	common: consolidate PEG string parsers (#20263 ) * common : consolidate PEG string parsers * cont : fix json_string_content()	2026-03-10 00:29:21 +01:00
Evan Huus	23fbfcb1ad	server: Parse port numbers from MCP server URLs in CORS proxy (#20208 ) * Parse port numbers from MCP server URLs * Pass scheme to http proxy for determining whether to use SSL * Fix download on non-standard port and re-add port to logging * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-03-09 17:47:54 +01:00
Piotr Wilkin (ilintar)	f76565db92	common: map developer role to system (#20215 ) * Map developer role to system * Simplify	2026-03-09 14:25:11 +01:00
Piotr Wilkin (ilintar)	97c64fbdbd	PEG parser for LFM2 (#20251 ) * PEG parser for LFM2 * Simplify using python_value()	2026-03-09 01:11:22 +01:00
Aldehir Rojas	451ef08432	common : gracefully handle incomplete output (#20191 ) * common : handle incomplete UTF-8 at end of input in PEG parser * cont : if reached end prematurely, emit needs_more_input to propagate partial output * cont: refactor peg parse context to add lenient flag * cont : remove partial flag, keep lenient flag	2026-03-08 17:17:02 +01:00
Piotr Wilkin (ilintar)	9b24886f78	Fix compile bug (#20203 ) * Fix compile bug * Update common/chat-auto-parser-helpers.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-08 17:15:49 +01:00
Piotr Wilkin (ilintar)	62b8143ad2	Fix structured outputs (#20223 ) * Fix structured outputs * Update common/chat-auto-parser-generator.cpp Co-authored-by: Aldehir Rojas <hello@alde.dev> --------- Co-authored-by: Aldehir Rojas <hello@alde.dev>	2026-03-08 17:14:43 +01:00
Johannes Gäßler	a976ff081b	llama: end-to-end tests (#19802 ) * tests: add end-to-end tests per model architecture * fixup for rebase * fix use-after-free in llama-model-loader.cpp * fix CI * fix WebGPU * fix CI * disable CI for macOS-latest-cmake-arm64 * use expert_weights_scale only if != 0.0f * comments	2026-03-08 12:30:21 +01:00
Piotr Wilkin (ilintar)	b283f6d5b3	Revert to OAI-compatible args (#20213 ) * Revert to OAI-compatible args * Apply workaround::func_args_not_string	2026-03-08 11:33:03 +01:00
Piotr Wilkin (ilintar)	c024d85908	Autoparser: True streaming (#20177 ) * Relax atomicity constraint for nicer, more pleasent, True Streaming parsing * Whitespace * Remove redundant atomics	2026-03-07 01:55:33 +01:00

1 2 3 4 5 ...

804 Commits