llama.cpp

Commit Graph

Author	SHA1	Message	Date
Pascal	5c28b7a2ee	chore: update webui build output	2026-01-17 18:38:50 +01:00
Pascal	fca7177eae	fix: ignore assistant attachments (MCP) for modality detection	2026-01-17 18:36:41 +01:00
Pascal	3572667788	chore: update webui build output	2026-01-17 16:35:54 +01:00
Pascal	506da17931	refactor: eliminate MCP circular dependency - Change architecture from mcpStore <-> mcpClient to mcpClient -> mcpStore - Remove bidirectional callback pattern (setCallback, notify methods) - Add updateState/updateHealthCheck public methods in mcpStore - Replace callback calls with direct mcpStore method calls - Remove unused imports (browser, HealthCheckState) and constructor - Fixes CI: ReferenceError Cannot access mcpClient before initialization	2026-01-17 16:30:42 +01:00
Pascal	9b3417703f	fix: remove obsolete modality UI tests causing CI failures - Remove VisionModality/AudioModality test stories - Remove mockServerProps usage and imports - Simplify Default test (remove dropdown interaction checks) - Simplify FileAttachments test (remove mocks)	2026-01-17 16:30:36 +01:00
Pascal	a723238245	chore: update webui build output	2026-01-16 19:52:23 +01:00
Pascal	229aba7c3e	fix: strip reasoning content and UI proprietary tags from prompts TODO: add toggle and ensure backend API compliance for reasoning format	2026-01-16 19:50:36 +01:00
Pascal	f09395821b	chore: update webui build output	2026-01-16 15:22:46 +01:00
Pascal	78c6380222	refactor: remove reasoning after first turn filter	2026-01-16 15:19:50 +01:00
Pascal	2973c64609	refactor: inline reasoning with tags, remove fixed thinking field	2026-01-16 15:19:42 +01:00
Pascal	a1550ab77d	chore: update webui build output	2026-01-16 11:02:17 +01:00
Pascal	db37b712b2	feat: resolve MCP attachment images via rehype plugin LLM can reference tool-generated images using markdown links like, plugin resolves attachment names to base64 from message.extra when present, regular HTTP/data URLs pass through unchanged (no regression) - rehypeResolveAttachmentImages plugin in markdown pipeline - Pass message prop to MarkdownContent and AgenticContent - Force processor reactivity on message.extra changes - Filter assistant images from API context (display-only)	2026-01-16 10:49:28 +01:00
Pascal	a3c2144c1d	feat: persist base64 attachments from tool results	2026-01-16 08:07:20 +01:00
Pascal	a377605f60	webui: fix custom headers persistence in UI (derived)	2026-01-15 20:36:14 +01:00
Pascal	3360f60b94	webui: fix custom headers persistence in UI	2026-01-15 20:13:01 +01:00
Aleksander Grygier	cffc3b46ae	fix: Word wrapping	2026-01-15 17:59:57 +01:00
Aleksander Grygier	5417a439ef	chore: update webui build output	2026-01-15 11:39:10 +01:00
Aleksander Grygier	30a585bb96	feat: UI improvements	2026-01-14 17:32:57 +01:00
Aleksander Grygier	886939c550	chore: update webui build output	2026-01-14 14:39:32 +01:00
Aleksander Grygier	39848ee12f	feat: UI improvement	2026-01-14 14:26:41 +01:00
Aleksander Grygier	c1ac8d7326	chore: update webui build output	2026-01-14 13:22:01 +01:00
Aleksander Grygier	afdae742e3	Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp	2026-01-14 13:20:25 +01:00
Aleksander Grygier	b11b32ea28	chore: update webui build output	2026-01-14 12:47:13 +01:00
Aleksander Grygier	06efeb6eb9	chore: update webui build output	2026-01-14 11:49:26 +01:00
Aleksander Grygier	f89bcb90ca	feat: MCP Server Details	2026-01-14 11:45:47 +01:00
Jeff Bolz	3e4bb29666	vulkan: Check maxStorageBufferRange in supports_op (#18709 ) * vulkan: Check maxStorageBufferRange in supports_op * skip maxStorageBufferRange check when shader64BitIndexing is enabled	2026-01-14 10:59:05 +01:00
Aman Gupta	47f9612492	llama-model: fix unfortunate typo (#18832 )	2026-01-14 17:55:15 +08:00
Daniel Bevenius	01cbdfd7eb	CUDA : fix typo in clang pragma comment [no ci] (#18830 )	2026-01-14 10:31:49 +01:00
Ruben Ortlam	635ef78ec5	vulkan: work around Intel fp16 bug in mmq (#18814 )	2026-01-14 09:41:23 +01:00
Perry Naseck	7d587e5544	ggml-metal: do not copy headers for embedded, use current binary dir for embedded (#18705 )	2026-01-14 09:22:25 +02:00
Daniel Benjaminsson	d34aa07193	mmap: add Haiku support by skipping RLIMIT_MEMLOCK check (#18819 ) Haiku OS does not support RLIMIT_MEMLOCK, similar to visionOS/tvOS. Skip the resource limit check on Haiku to allow mlock functionality to work without compile errors. Tested on Haiku with NVIDIA RTX 3080 Ti using Vulkan backend.	2026-01-14 09:11:05 +02:00
Adrien Gallouët	f709c7a33f	ci, tests : use cmake to download models and remove libcurl dependency (#18791 ) * ci, tests : use cmake to download models and remove libcurl dependency * llama_dl_model -> llama_download_model * use EXPECTED_HASH for robust model downloading * Move llama_download_model to cmake/common.cmake Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-14 07:46:27 +01:00
ddh0	6e36299b47	llama : print_info alignment fix (#18708 ) * fix text spacing in print_info * align all	2026-01-14 00:05:11 +01:00
Junwon Hwang	60591f01d4	model : add EXAONE MoE (#18543 ) * Add EXAONE MoE implementations Co-authored-by: Junwon Hwang <nuclear1221@gmail.com> * Address PR feedback * Address PR feedback * [WIP] Add MTP for EXAONE-MoE * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback --------- Co-authored-by: LG-AI-EXAONE <exaonemodels@lgresearch.ai>	2026-01-13 23:28:38 +01:00
Georgi Gerganov	e4832e3ae4	vocab : fix attribute overrides for harmony (#18806 ) * vocab : fix attribute overrides for harmony * cont : add warning log	2026-01-13 17:40:13 +02:00
Ruben Ortlam	960e5e3b46	llama-mmap: fix direct-io loading fallback EOF exception (#18801 )	2026-01-13 15:57:07 +01:00
Daniel Bevenius	20ca2e12c4	model-conversion : remove -c 0 from model card template [no ci] (#18807 ) This commit removes the `-c, --ctx-size N` from the llama-server command in the model card template for causal models. The motivation for this is that -c 0 is the default and specifying it is redundant.	2026-01-13 14:13:10 +01:00
yulo	ea4a321f2a	HIP: add fattn-mma-f16 for RDNA4 (#18481 ) * finish VQ mma * flash_attn_ext_f16_iter * KQ_rowsum * correct exp * fix scale error * fix softmax scale * fix softmax scale * enable fattn on cpu side * fix random error * disable fattn-mma-f16 on rdna3 * fix wrong col for rdna * use identity mat to transpose * resolve conflicts * basic tuning for DeepSeek-R1-Distill-Qwen-1.5B * fix volta compile error * align rdna4 policy for fattn * adjust fattn policy * adjust kernel selection logic * update as the review comments * keep fattn-wmma logic * adjust kernel selection logic --------- Co-authored-by: zhang hui <you@example.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2026-01-13 13:52:16 +01:00
Johannes Gäßler	c1e79e610f	doc: ban AI-generated PR descriptions [no ci] (#18765 )	2026-01-13 13:43:12 +01:00
Xuan-Son Nguyen	e047f9ee9d	mtmd: fix use_non_causal being reported incorrectly (#18793 ) * mtmd: fix use_non_causal being reported incorrectly * move clip_is_mrope to mtmd_decode_use_mrope * fix sloppy code ggml_cpy	2026-01-13 12:19:38 +01:00
Georgi Gerganov	0a57271ab6	CUDA : fix unused argument when USE_CUDA_GRAPH=OFF (#18800 )	2026-01-13 12:25:53 +02:00
Gabe Goodhart	076b0faf7d	graph : clean up t5 input builders (#18795 ) * fix: Remove unnecessary `h` loops where `h` was only ever 0 Branch: CleanUpT5InputBuilders Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Remove unnecessary padding loop that is never hit anymore The upper bound used to use GGML_PAD(n_tokens, GGML_KQ_MASK_PAD), but was removed in https://github.com/ggml-org/llama.cpp/pull/17910 leaving the loop dead. Branch: CleanUpT5InputBuilders Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2026-01-13 09:43:51 +01:00
Ruben Ortlam	db79dc06b1	llama-bench: add direct_io parameter (#18778 )	2026-01-13 08:49:10 +01:00
Adrien Gallouët	537d4240d4	ci : remove libcurl in releases (#18775 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-12 21:43:02 +01:00
Aleksander Grygier	120f3c978c	chore: update webui build output	2026-01-12 18:27:54 +01:00
Aleksander Grygier	5407b2efab	feat: MCP connection details WIP	2026-01-12 18:26:48 +01:00
Radoslav Gerganov	bcf7546160	server : add arg for disabling prompt caching (#18776 ) * server : add arg for disabling prompt caching Disabling prompt caching is useful for clients who are restricted to sending only OpenAI-compat requests and want deterministic responses. * address review comments * address review comments	2026-01-12 19:21:34 +02:00
Aleksander Grygier	0009c0c300	refactor: MCP types and health check	2026-01-12 18:12:08 +01:00
Adrien Gallouët	36c5913c45	ci : use openssl for openEuler-latest-cmake-cann (#18779 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-12 17:29:00 +01:00
Adrien Gallouët	8e649571cd	vendor : update cpp-httplib to 0.30.1 (#18771 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-12 15:58:52 +01:00

1 2 3 4 5 ...

7839 Commits All Branches Search

7839 Commits

All Branches