llama.cpp

Commit Graph

Author	SHA1	Message	Date
Piotr Wilkin (ilintar)	12a4a47e6a	Fix GLM 4.7 Lite MoE gating func (#18980 ) * Fix GLM 4.7 MoE gating func * Update src/models/deepseek2.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-model.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-01-21 12:35:20 +01:00
Matthieu Coudron	37c35f0e1c	gguf: display strerrno when cant load a model (#18884 ) I've had issues loading models with llama-server: [44039] E gguf_init_from_file: failed to open GGUF file 'mistral-7b-v0.1.Q8_0.gguf' and I was sure it could access the file. Seems like --models-dir and --models-presets dont interact like I thought they would but I salvaged this snippet that helps troubleshooting [44039] E gguf_init_from_file: failed to open GGUF file 'mistral-7b-v0.1.Q8_0.gguf' (errno No such file or directory)	2026-01-21 08:52:46 +02:00
Oliver Simons	5bd341c9a1	CUDA: Fix builds for older CCCL versions by ifdefing strided_iterator (#18964 ) * CUDA: Fix builds for older CCCL versions by ifdefing strided_iterator Strided iterator was added in [CCCL 3.1](https://github.com/NVIDIA/cccl/releases/tag/v3.1.0), which is packaged into [CTK 13.1](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#id5) * Unindent as per code review request	2026-01-21 02:34:29 +01:00
Adrien Gallouët	1c7cf94b22	common, server : use the same User-Agent by default (#18957 ) This commit also ensures that if a custom User-Agent is used, it will be the only one sent. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-20 18:28:43 +01:00
Xuan-Son Nguyen	2c1f199653	cli : fix reasoning responses in CLI (#18961 ) * cli : fix reasoning responses in CLI * fix build * fix build (2)	2026-01-20 18:23:25 +01:00
Oliver Simons	d1e3556481	CUDA: Replace init_offsets kernel with iterators in cub-based argsort (#18930 ) * CUDA: Replace `init_offsets` with iterators in argsort This is a QOL improvement, saving us the cost of materializing the iterator * Remove unnecessary include from top-k.cu	2026-01-20 20:11:01 +08:00
Adrien Gallouët	08f3f4a8a3	ggml : cleanup path_str() (#18928 ) - Remove pragmas as `std::codecvt_utf8` is not used. - Avoid implicit `strlen()`. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-20 11:42:49 +01:00
Georgi Gerganov	271191906c	metal : enable FA for MLA heads (#18950 )	2026-01-20 12:21:28 +02:00
Daniel Bevenius	7dee9ff59a	convert : use n_groups instead of hardcoded values in reshape (#18929 ) * convert : use n_groups instead of hardcoded values in reshape This commit modifies the conversion script for NemotronHModel to use the 'n_groups' hyperparameter, and allow Python to calculate the the last dimension, using -1, when reshaping the 'mixer.norm.weight' tensor. * use self.n_group instead of self.hparams["n_groups"]	2026-01-20 06:55:24 +01:00
Xuan-Son Nguyen	6df686bee6	server : refactor oai_parser_opt, move it to server_chat_params (#18937 ) * server_chat_params * move chat format into CLI * use meta whenever possible * clean up, no more chatml fallback	2026-01-19 23:28:01 +01:00
ddh0	1706a6d7c6	convert : support Glm4MoeLite (#18936 ) * initial commit for branch * add glm-4.7-flash, move tokenizer hash * use `glm4` pretok * silence flake8 E302 (CI) * apply review feedback * add <\|user\|> as eog * also add EOG `<\|observation\|>` * revert llama-vocab * inherit vocab from glm4 --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-01-19 23:09:20 +01:00
Sigbjørn Skjæret	959ecf7f23	jinja : fix undefined keys and attributes and int/float as bool (#18924 ) * fix undefined keys and attributes * add falsy tests * as_bool for integers and floats * more falsy/truthy tests * --typo	2026-01-19 20:29:43 +01:00
Sigbjørn Skjæret	4037093c66	ci : run test-jinja -py on high perf [no ci] (#18916 )	2026-01-19 20:29:15 +01:00
Lennart Austenfeld	18361c579c	server: fix memory reservations in populate_token_probs (#18787 )	2026-01-19 19:13:31 +01:00
Georgi Gerganov	365a3e8c31	ggml : add ggml_build_forward_select (#18550 ) * ggml : add ggml_build_forward_select * cuda : adapt CUDA graph compat to new feature * vulkan : update logic to handle command buffer closing * ggml : check compute for fusion * ggml : add comment	2026-01-19 20:03:19 +02:00
Aleksander Grygier	39d0ff485d	chore: update webui build output	2026-01-19 19:02:40 +01:00
Aleksander Grygier	8a95ec3ea6	feat: Improve MCP Server selection UI + lazy load health checks	2026-01-19 19:01:32 +01:00
Aleksander Grygier	cafb9c09d3	feat: UI improvements	2026-01-19 16:56:02 +01:00
Aleksander Grygier	54192b05fb	feat: Simplify MCP server enabling logic per chat Refactors MCP server enabling logic to remove the dependency on global settings. This simplifies the logic by directly checking the per-chat override status, and removes the need to pass the global enabled state as a parameter. Additionally: - Only shows MCP servers that are enabled in settings in the selector. - Sorts the servers by whether they are enabled for the current chat.	2026-01-19 16:43:53 +01:00
Aleksander Grygier	62ed7f112d	chore: update webui build output	2026-01-19 16:26:16 +01:00
Aleksander Grygier	d37683942b	fix: Missing onModelChange callback running assistant message re-generation	2026-01-19 16:25:49 +01:00
Daniel Bevenius	3d55846a5c	model-conversion : add BUILD_DIR variable to run-converted-model scripts (#18927 ) This commit adds a BUILD_DIR variable to the scripts used for running converted models. The motivation for this is that currently the `build` directory is hardcoded and it can be useful to specify a different build directory, with builds for different configurations.	2026-01-19 13:12:38 +01:00
Pascal	d6dfe8e064	chore: update webui build output	2026-01-19 12:12:52 +01:00
Pascal	058929d453	fix: acurate tool_response display	2026-01-19 12:11:06 +01:00
Julius Tischbein	287a33017b	llama : Extend fallback, fix fileno for dio file, exclude case that mmap uses dio file (#18887 )	2026-01-18 18:35:57 +02:00
Pascal	d92b621346	fix: unify MCP server label logic with simplified fallback	2026-01-18 13:10:03 +01:00
Francisco Herrera	293a1565dc	docs: add linux to index (#18907 )	2026-01-18 18:03:35 +08:00
Pascal	16a03eea36	chore: update webui build output	2026-01-18 10:43:45 +01:00
Pascal	d8af98f1ed	refactor: remove multimodal validation from model selector Remove all frontend validation logic that prevented users from selecting models based on multimodal capabilities. This refactoring removes restrictive UI code while maintaining full functionality - Vision models can describe images as text - That text remains useful for non-vision models - Chaining vision -> non-vision is a valid workflow - Users know their use case better than the UI - Users can return to vision models when needed	2026-01-18 10:42:01 +01:00
Xuan-Son Nguyen	fe44d35574	tests : add test-jinja -py option for cross-checking (#18906 ) * tests : add test-jinja -py option or cross-checking * Update tests/test-jinja.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix + add source * SandboxedEnvironment * fix array.map case --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-18 08:14:27 +01:00
Sigbjørn Skjæret	bbcdac0189	jinja : fix object item order (and properly implement dictsort) (#18904 ) * fix object item order * as_ordered_object * copy whole object	2026-01-18 03:40:06 +01:00
Sigbjørn Skjæret	d03c45c9c5	jinja : attribute support for join, map and sort (#18883 ) * support negative array index and default value * attribute support (int and str) for join, map and sort * add tests * update CODEOWNERS * improve fixme sorting comment	2026-01-18 02:53:01 +01:00
Sigbjørn Skjæret	10c98cbdf6	jinja : add missing tojson filter for bool (#18900 ) * add missing tojson for bool * add more literal tests	2026-01-18 01:05:09 +01:00
Sigbjørn Skjæret	420960ab92	jinja : fix lexing of float literals with sign (#18901 ) * fix lexing of float literals with sign * add test * consume_numeric	2026-01-18 00:57:51 +01:00
Xuan-Son Nguyen	f55b033ae6	jinja: correct member access rule (#18905 )	2026-01-18 00:48:55 +01:00
lhez	d1b4757ded	opencl: fix q6_K mv for m=1 (#18893 )	2026-01-17 13:50:32 -08:00
Sigbjørn Skjæret	57c0beaed0	ci : add label for jinja changes (#18903 )	2026-01-17 21:52:02 +01:00
Pascal	5c28b7a2ee	chore: update webui build output	2026-01-17 18:38:50 +01:00
Pascal	fca7177eae	fix: ignore assistant attachments (MCP) for modality detection	2026-01-17 18:36:41 +01:00
Pascal	3572667788	chore: update webui build output	2026-01-17 16:35:54 +01:00
Pascal	506da17931	refactor: eliminate MCP circular dependency - Change architecture from mcpStore <-> mcpClient to mcpClient -> mcpStore - Remove bidirectional callback pattern (setCallback, notify methods) - Add updateState/updateHealthCheck public methods in mcpStore - Replace callback calls with direct mcpStore method calls - Remove unused imports (browser, HealthCheckState) and constructor - Fixes CI: ReferenceError Cannot access mcpClient before initialization	2026-01-17 16:30:42 +01:00
Pascal	9b3417703f	fix: remove obsolete modality UI tests causing CI failures - Remove VisionModality/AudioModality test stories - Remove mockServerProps usage and imports - Simplify Default test (remove dropdown interaction checks) - Simplify FileAttachments test (remove mocks)	2026-01-17 16:30:36 +01:00
Georgi Gerganov	2fbde785bc	kv-cache : optimize KQ mask construction (#18842 ) * kv-cache : optimize KQ mask construction * cont : add explanation + improve * cont : fix	2026-01-17 15:42:42 +02:00
Reese Levine	a89002f07b	ggml webgpu: support for backend sampling (#18880 ) * ggml webgpu: add SOFTPLUS unary operator Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32 precision for intermediate calculations to prevent f16 overflow. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * Follow Vulkan backend numerical stability pattern * ggml webgpu: add EXPM1 unary operator Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add FLOOR unary operator Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add CEIL unary operator Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add ROUND unary operator Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add TRUNC unary operator Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS) * Updates to webgpu get_memory * Add argmax * Add argmax,cumsum,sum,sum_rows * Add necessary CPY/GET_ROWS operators * Support for argsort using multi-pass strategy * Update set_rows for i32 indices, move to pre-wgsl * Port unary operators to pre-wgsl and support FILL * Implement PAD * Add support for top-k * clean up, scope pipeline init mutex * fix newline * Add support for log * Update LOG for better precision, and ops doc --------- Co-authored-by: Abhijit Ramesh <abhijitramesh2k@gmail.com>	2026-01-16 16:12:43 -08:00
Pascal	a723238245	chore: update webui build output	2026-01-16 19:52:23 +01:00
Pascal	229aba7c3e	fix: strip reasoning content and UI proprietary tags from prompts TODO: add toggle and ensure backend API compliance for reasoning format	2026-01-16 19:50:36 +01:00
Thore Koritzius	388ce82241	ggml : extend ggml_pool_1d + metal (#16429 ) * chore: resolve conflicts * feat: ggml metal impl * fix: ggml_metal_kargs_pool_1d struct * fix: require contiguous input * chore: test pool_1d * chore: limit pool1d test cases to p0=0 and s0=k0 to conform with asserts * chore: add p0 and s0 to testing * fix: allow padding for cpu and metal * Update ggml/src/ggml-metal/ggml-metal.metal * fix: correct single-threaded loop * ggml : cleanup * tests : add ne[1] != 1 tests * fix: ne[1] handling in np * cont : fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-01-16 16:59:56 +02:00
Pascal	f09395821b	chore: update webui build output	2026-01-16 15:22:46 +01:00
Pascal	78c6380222	refactor: remove reasoning after first turn filter	2026-01-16 15:19:50 +01:00
Pascal	2973c64609	refactor: inline reasoning with tags, remove fixed thinking field	2026-01-16 15:19:42 +01:00

1 2 3 4 5 ...

8003 Commits All Branches Search

8003 Commits

All Branches