llama.cpp

Commit Graph

Author	SHA1	Message	Date
Imad Saddik	70c10f0cc1	Merge `f62e0d9b4e` into `88915cb55c`	2026-03-15 19:25:21 +00:00
Imad Saddik	f62e0d9b4e	chore: update webui build output	2026-03-15 19:25:09 +00:00
Imad Saddik	9e5d40a550	chore: set autoChatWidth to true by default	2026-03-15 19:18:21 +00:00
Georgi Gerganov	88915cb55c	server : fix wait in test_cancel_requests() test (#20601 ) * server : fix wait in test_cancel_requests() test * codeowners : add team for server tests	2026-03-15 20:54:37 +02:00
Sigbjørn Skjæret	ebbf544ed1	sycl : fix for untransposed GDA recurrent state (#20583 )	2026-03-15 19:10:15 +01:00
Imad Saddik	0dd4d9d588	chore: update webui build output	2026-03-15 18:09:22 +00:00
Imad Saddik	d549ab4893	style: remove max width 48rem from agentic-content	2026-03-15 18:07:01 +00:00
Sigbjørn Skjæret	b91d7dfe5b	ci : only save openvino caches on github-hosted master (#20593 ) * only save openvino ccache on master * disable toolkit cache if self-hosted * only cache on github-hosted runners * remove toolkit cache [no ci]	2026-03-15 18:58:13 +01:00
Imad Saddik	a0eccc8652	chore: update webui build output	2026-03-15 17:55:17 +00:00
Imad Saddik	333bfc7231	chore: undo changes in ChatScreenProcessingInfo	2026-03-15 17:53:31 +00:00
Imad Saddik	c6c63786c2	chore: update webui build output	2026-03-15 17:49:34 +00:00
Imad Saddik	de04a9b0e6	style: reset the width of the processing info div	2026-03-15 17:48:16 +00:00
Imad Saddik	715ba4ee85	chore: update webui build output	2026-03-15 17:40:36 +00:00
Imad Saddik	fa7d3a96c5	fix: keep the container spanning the whole width to fix scroll bar issue	2026-03-15 17:39:19 +00:00
Imad Saddik	c399ec3c46	chore: update webui build output	2026-03-15 17:36:11 +00:00
Imad Saddik	72c1928dc9	refactor: move widthClasses.class to the container div	2026-03-15 17:34:31 +00:00
Johannes Gäßler	ae40cd27c8	CUDA: limit number of FA stream-k CUDA blocks (#20586 )	2026-03-15 18:30:47 +01:00
Imad Saddik	0cd953eea2	chore: update webui build output	2026-03-15 17:13:14 +00:00
Imad Saddik	6074619ba4	style: restore class for checkbox labels	2026-03-15 17:11:58 +00:00
Imad Saddik	297abf8450	chore: update webui build output	2026-03-15 17:09:59 +00:00
Imad Saddik	d4034eff07	fix: update chatWidthClasses to use autoChatWidth configuration	2026-03-15 17:08:43 +00:00
Imad Saddik	b73209694d	chore: update webui build output	2026-03-15 17:03:58 +00:00
Imad Saddik	2630c27754	refactor: simplify chatWidthClasses getter logic and remove widthClasses.class	2026-03-15 17:02:41 +00:00
Imad Saddik	1a6f21f25c	chore: revert package-lock.json to match master	2026-03-15 16:56:41 +00:00
Imad Saddik	95be04617e	chore: update webui build output	2026-03-15 16:56:08 +00:00
Imad Saddik	2836834801	refactor: remove anything related to the custom chat width setting	2026-03-15 16:54:43 +00:00
Imad Saddik	20a8227933	chore: update webui build output	2026-03-15 16:44:35 +00:00
Imad Saddik	89647d5daf	chore: downgrade @lucide/svelte version and remove custom chat width component	2026-03-15 16:43:18 +00:00
Pascal	ceef6b5233	ggml: avoid creating CUDA context during device init (#20595 )	2026-03-16 00:42:56 +08:00
Adrien Gallouët	07c6a59b4f	vendor : update cpp-httplib to 0.38.0 (#20578 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-15 17:30:06 +01:00
MoonShadow	8b7d340b6f	ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain (#20536 ) * ggml/hip: fix APU compatibility - soft error handling for hipMemAdviseSetCoarseGrain On AMD APU/iGPU devices (unified memory architecture), hipMemAdviseSetCoarseGrain returns hipErrorInvalidValue because the hint is not applicable to UMA systems. The previous CUDA_CHECK() call treated this as a fatal error, causing crashes on APU systems such as AMD Strix Halo (gfx1151). Fix: treat hipMemAdviseSetCoarseGrain as an optional performance hint - call it without error checking and clear any resulting error with hipGetLastError(). Also add pre-allocation debug logging (GGML_LOG_DEBUG) to help diagnose memory issues on APU systems, and store totalGlobalMem in device info. Context: AMD APUs on Windows are affected by a ROCm runtime bug that limits hipMallocManaged to ~64GB regardless of available system RAM. A fix has been submitted upstream: https://github.com/ROCm/rocm-systems/pull/4077 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ggml/hip: remove unrelated changes, keep only hipMemAdviseSetCoarseGrain fix --------- Co-authored-by: moonshadow-25 <moonshadow-25@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-03-15 17:23:58 +01:00
Eric Hsieh	559646472d	fix: prevent nullptr dereference (#20552 )	2026-03-15 16:51:49 +01:00
Sigbjørn Skjæret	cf45437d35	codeowners : use teams (#20526 ) * use teams * update * update * update * update * update	2026-03-15 14:26:10 +01:00
Georgi Gerganov	9cd4ebcfb1	ci : split build.yml + server.yml (#20546 ) * ci : split build.yml * cont : split server.yml * cont : reduce paths * cont : split build-android.yml + update paths * ci : make msys workflows manual (#20588) * ci : make cross-build workflows manual (#20585) * cont : fix release paths Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-15 15:11:17 +02:00
Sigbjørn Skjæret	89d0aec042	convert : support contiguous method on lora tensors (#20489 )	2026-03-15 12:15:12 +01:00
Bartowski	b9da4444df	ggml : guard against sumq2 being 0 in IQ4_NL (#20460 )	2026-03-15 10:47:28 +02:00
PikaPikachu	617db241aa	cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode (#19478 ) * mmvq: add RDNA3/RDNA4-specific parameter table (nwarps=8, rows=1) * mmvq: add dedicated RDNA3 parameter table * mmvq: exclude RDNA3.5 (gfx1150/1151) from RDNA3 table	2026-03-15 08:33:39 +01:00
Ruben Ortlam	1a3d8edbba	vulkan: use graphics queue on AMD (#20551 ) * vulkan: use graphics queue on AMD for slightly better performance * disable async transfer queue on AMD	2026-03-15 08:18:54 +01:00
sprayandwipe	6b10a82c00	kv-cache : fix reading llama_kv_cell_ext during state read (#20273 ) Co-authored-by: sid <sid@ragingfist.net>	2026-03-15 09:11:19 +02:00
Michael Wand	d23355afc3	model : wire up Qwen3.5/Qwen3.5MoE tensors for NVFP4 support (#20506 )	2026-03-14 22:44:42 +01:00
Georgi Gerganov	b30a5fdf37	metal : add FA specialization for HSK = 320, HSV = 256 (#20549 )	2026-03-14 23:15:47 +02:00
Georgi Gerganov	b4768955c4	ci : move self-hosted workflows to separate files (#20540 )	2026-03-14 23:15:35 +02:00
Gerard Guillemas Martos	fc350fdf96	docker : force Python 3.13 in Vulkan container (#20530 ) * ci: force Python 3.13 in Vulkan container * remove unnecessary `update-alternatives` line	2026-03-14 21:37:09 +01:00
Eve	3a6f059909	ci : try to optimize some jobs (#20521 ) * force arm version to test * run on either x86 or arm if we can help it, this only works for runs without ccache * readd other jobs * remove ccache	2026-03-14 20:27:52 +01:00
Max Krasnyansky	609ea50026	hexagon: Q4_0 and MXFP4 repack fixes (#20527 ) * hexagon: fix tail corruption with rows sizes not multiple of 256 * hexagon: use different stride for repacking partial blocks * hex-mm: update repack and kernels to avoid shuffles for full 256-element blocks Previous commit changed the repacking to use even:odd (0:1,2:3,..) packing instead of the original (0:128,1:129,...) packing in order to fix tail corruption. Since the mm kernels already deal with partial tails we can use even:odd packing only for the last block. This avoid performance penalty of having to shuffle to zip the elements in the common case. * hex-mm: update rmpy x8 for better optimizations * hex-mm: tighten supported MUL_MAT checks to avoid spurios failures * hex-mm: use vzero to init accumulators * hex-mm: properly call partial rmpy_x8	2026-03-14 11:09:08 -07:00
Georgi Gerganov	9f774e45ee	ci : reduce webgpu tests timeout to 900s (#20538 ) [no ci]	2026-03-14 17:08:26 +02:00
Xuan-Son Nguyen	94d0262277	mtmd: add llama-mtmd-debug binary (#20508 ) * mtmd: add llama-mtmd-debug binary * adapt * fixes * fix compile error * fix windows compile error * rm legacy clip_debug_encode() * add MTMD_API to fix build	2026-03-14 15:52:29 +01:00
Neo Zhang	a93c0ef0fa	add op gated_delta_net (#20455 )	2026-03-14 22:01:57 +08:00
Chedrian07	710878a7dd	webui: restore code preview iframe origin isolation (#20477 )	2026-03-14 11:28:28 +01:00
Adrien Gallouët	0685848bc6	scripts : remove get-wikitext-103.sh (#20543 ) It doesn't work and no one seems to use it. $ wget https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-raw-v1.zip HTTP request sent, awaiting response... 301 Moved Permanently Location: unspecified ERROR: Redirection (301) without location. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-03-14 11:22:04 +01:00

1 2 3 4 5 ...

8448 Commits All Branches Search

8448 Commits

All Branches