llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aleksander Grygier	5417a439ef	chore: update webui build output	2026-01-15 11:39:10 +01:00
Aleksander Grygier	30a585bb96	feat: UI improvements	2026-01-14 17:32:57 +01:00
Aleksander Grygier	886939c550	chore: update webui build output	2026-01-14 14:39:32 +01:00
Aleksander Grygier	39848ee12f	feat: UI improvement	2026-01-14 14:26:41 +01:00
Aleksander Grygier	c1ac8d7326	chore: update webui build output	2026-01-14 13:22:01 +01:00
Aleksander Grygier	afdae742e3	Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp	2026-01-14 13:20:25 +01:00
Aleksander Grygier	b11b32ea28	chore: update webui build output	2026-01-14 12:47:13 +01:00
Aleksander Grygier	06efeb6eb9	chore: update webui build output	2026-01-14 11:49:26 +01:00
Aleksander Grygier	f89bcb90ca	feat: MCP Server Details	2026-01-14 11:45:47 +01:00
Jeff Bolz	3e4bb29666	vulkan: Check maxStorageBufferRange in supports_op (#18709 ) * vulkan: Check maxStorageBufferRange in supports_op * skip maxStorageBufferRange check when shader64BitIndexing is enabled	2026-01-14 10:59:05 +01:00
Aman Gupta	47f9612492	llama-model: fix unfortunate typo (#18832 )	2026-01-14 17:55:15 +08:00
Daniel Bevenius	01cbdfd7eb	CUDA : fix typo in clang pragma comment [no ci] (#18830 )	2026-01-14 10:31:49 +01:00
Ruben Ortlam	635ef78ec5	vulkan: work around Intel fp16 bug in mmq (#18814 )	2026-01-14 09:41:23 +01:00
Perry Naseck	7d587e5544	ggml-metal: do not copy headers for embedded, use current binary dir for embedded (#18705 )	2026-01-14 09:22:25 +02:00
Daniel Benjaminsson	d34aa07193	mmap: add Haiku support by skipping RLIMIT_MEMLOCK check (#18819 ) Haiku OS does not support RLIMIT_MEMLOCK, similar to visionOS/tvOS. Skip the resource limit check on Haiku to allow mlock functionality to work without compile errors. Tested on Haiku with NVIDIA RTX 3080 Ti using Vulkan backend.	2026-01-14 09:11:05 +02:00
Adrien Gallouët	f709c7a33f	ci, tests : use cmake to download models and remove libcurl dependency (#18791 ) * ci, tests : use cmake to download models and remove libcurl dependency * llama_dl_model -> llama_download_model * use EXPECTED_HASH for robust model downloading * Move llama_download_model to cmake/common.cmake Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-14 07:46:27 +01:00
ddh0	6e36299b47	llama : print_info alignment fix (#18708 ) * fix text spacing in print_info * align all	2026-01-14 00:05:11 +01:00
Junwon Hwang	60591f01d4	model : add EXAONE MoE (#18543 ) * Add EXAONE MoE implementations Co-authored-by: Junwon Hwang <nuclear1221@gmail.com> * Address PR feedback * Address PR feedback * [WIP] Add MTP for EXAONE-MoE * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback --------- Co-authored-by: LG-AI-EXAONE <exaonemodels@lgresearch.ai>	2026-01-13 23:28:38 +01:00
Georgi Gerganov	e4832e3ae4	vocab : fix attribute overrides for harmony (#18806 ) * vocab : fix attribute overrides for harmony * cont : add warning log	2026-01-13 17:40:13 +02:00
Ruben Ortlam	960e5e3b46	llama-mmap: fix direct-io loading fallback EOF exception (#18801 )	2026-01-13 15:57:07 +01:00
Daniel Bevenius	20ca2e12c4	model-conversion : remove -c 0 from model card template [no ci] (#18807 ) This commit removes the `-c, --ctx-size N` from the llama-server command in the model card template for causal models. The motivation for this is that -c 0 is the default and specifying it is redundant.	2026-01-13 14:13:10 +01:00
yulo	ea4a321f2a	HIP: add fattn-mma-f16 for RDNA4 (#18481 ) * finish VQ mma * flash_attn_ext_f16_iter * KQ_rowsum * correct exp * fix scale error * fix softmax scale * fix softmax scale * enable fattn on cpu side * fix random error * disable fattn-mma-f16 on rdna3 * fix wrong col for rdna * use identity mat to transpose * resolve conflicts * basic tuning for DeepSeek-R1-Distill-Qwen-1.5B * fix volta compile error * align rdna4 policy for fattn * adjust fattn policy * adjust kernel selection logic * update as the review comments * keep fattn-wmma logic * adjust kernel selection logic --------- Co-authored-by: zhang hui <you@example.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2026-01-13 13:52:16 +01:00
Johannes Gäßler	c1e79e610f	doc: ban AI-generated PR descriptions [no ci] (#18765 )	2026-01-13 13:43:12 +01:00
Xuan-Son Nguyen	e047f9ee9d	mtmd: fix use_non_causal being reported incorrectly (#18793 ) * mtmd: fix use_non_causal being reported incorrectly * move clip_is_mrope to mtmd_decode_use_mrope * fix sloppy code ggml_cpy	2026-01-13 12:19:38 +01:00
Georgi Gerganov	0a57271ab6	CUDA : fix unused argument when USE_CUDA_GRAPH=OFF (#18800 )	2026-01-13 12:25:53 +02:00
Gabe Goodhart	076b0faf7d	graph : clean up t5 input builders (#18795 ) * fix: Remove unnecessary `h` loops where `h` was only ever 0 Branch: CleanUpT5InputBuilders Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Remove unnecessary padding loop that is never hit anymore The upper bound used to use GGML_PAD(n_tokens, GGML_KQ_MASK_PAD), but was removed in https://github.com/ggml-org/llama.cpp/pull/17910 leaving the loop dead. Branch: CleanUpT5InputBuilders Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2026-01-13 09:43:51 +01:00
Ruben Ortlam	db79dc06b1	llama-bench: add direct_io parameter (#18778 )	2026-01-13 08:49:10 +01:00
Adrien Gallouët	537d4240d4	ci : remove libcurl in releases (#18775 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-12 21:43:02 +01:00
Aleksander Grygier	120f3c978c	chore: update webui build output	2026-01-12 18:27:54 +01:00
Aleksander Grygier	5407b2efab	feat: MCP connection details WIP	2026-01-12 18:26:48 +01:00
Radoslav Gerganov	bcf7546160	server : add arg for disabling prompt caching (#18776 ) * server : add arg for disabling prompt caching Disabling prompt caching is useful for clients who are restricted to sending only OpenAI-compat requests and want deterministic responses. * address review comments * address review comments	2026-01-12 19:21:34 +02:00
Aleksander Grygier	0009c0c300	refactor: MCP types and health check	2026-01-12 18:12:08 +01:00
Adrien Gallouët	36c5913c45	ci : use openssl for openEuler-latest-cmake-cann (#18779 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-12 17:29:00 +01:00
Adrien Gallouët	8e649571cd	vendor : update cpp-httplib to 0.30.1 (#18771 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-12 15:58:52 +01:00
Aleksander Grygier	0180becb8b	chore: update webui build output	2026-01-12 15:26:46 +01:00
Aleksander Grygier	08c1acd1db	refactor: KeyValuePairs component	2026-01-12 15:25:43 +01:00
Aleksander Grygier	392a6dce0d	chore: update webui build output	2026-01-12 15:15:19 +01:00
Aleksander Grygier	a44332b528	refactor: DRY	2026-01-12 15:10:18 +01:00
Aleksander Grygier	80e829a248	chore: update webui build output	2026-01-12 14:49:11 +01:00
Aleksander Grygier	60ef752d0f	refactor: Architecture improvements	2026-01-12 14:45:24 +01:00
Aleksander Grygier	a63a421952	chore: update webui build output	2026-01-12 14:18:15 +01:00
Aleksander Grygier	58ab834b18	refactor: MCP state management + stores/clients relationship	2026-01-12 14:17:06 +01:00
Daniel Bevenius	4150da9a95	examples : add --kv-unified to batched example (#18774 ) This commit adds the --kv-unified flag to the batched example. This flag is currently specified in the README.md as required, but is currently not available as a command line option for the batched example. The motivation for this is that specifying this flag as the README instructs, will lead to an error about the flag not being recognized, and without this option the example fail with the following error: ```console split_equal: sequential split is not supported when there are coupled sequences in the input batch (you may need to use the -kvu flag) decode: failed to find a memory slot for batch of size 4 main: llama_decode() failed ```	2026-01-12 13:47:58 +01:00
Jeff Bolz	8e2da778da	vulkan: change memory_logger to be controlled by an env var (#18769 )	2026-01-12 13:32:55 +01:00
Xuan-Son Nguyen	ce3bf9b1a4	server: update docs for sleeping [no ci] (#18777 )	2026-01-12 13:01:24 +01:00
Jeff Bolz	2bbe4c2cf8	vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (#18678 ) This fixes incoherent output in Llama-4-Maverick-17B-128E-PAB-Q8_0, which has a mul_mat_id with an A matrix that's Q8_0 8192 x 5120 x 128. This should work when the number of blocks in the A matrix is less than 2^32 (for mul_mat_vec or mul_mm_cm2), or for mul_mm I think the limit is like 2^32*LOAD_VEC_A elements. - Divide batch_stride by QUANT_K earlier, so the block index calculation works in 32b. - Each vk_pipeline_struct has a linked list of pipelines that will allow it to handle variants. So far this change just adds a single use case for this, compiling with the e64BitIndexingEXT flag. - Use the 64b indexing variant when the A matrix is larger than maxStorageBufferRange. 64-bit indexing has some cost - around 3-5% in MoE models, so it's worth the effort to avoid enabling it unconditionally.	2026-01-12 12:32:13 +01:00
Aleksander Grygier	9c53bd4486	chore: update webui build output	2026-01-12 11:16:18 +01:00
Aleksander Grygier	528a560a25	fix: Distinguish streaming vs incomplete tool calls in UI	2026-01-12 11:15:58 +01:00
Aleksander Grygier	aa9054367a	chore: update webui build output	2026-01-12 11:10:24 +01:00
Aleksander Grygier	cead02ee58	fix: Restore live reactive UI progress for tool calls	2026-01-12 11:07:56 +01:00

1 2 3 4 5 ...

7823 Commits All Branches Search

7823 Commits

All Branches