llama.cpp

Commit Graph

Author	SHA1	Message	Date
Pascal	5a4e4f4189	chore: update webui build output	2026-02-01 04:13:48 +01:00
Pascal	42244c0162	fix: also skip image attachments in message history for non-vision backends	2026-02-01 04:13:37 +01:00
Pascal	6b7e6f18a6	chore: update webui build output	2026-02-01 03:22:09 +01:00
Pascal	893dbb058a	fix: skip sending image attachments to non-vision backends	2026-02-01 03:20:36 +01:00
Pascal	556029eee6	chore: update webui build output	2026-01-31 08:27:11 +01:00
Pascal	1384352484	fix: responsive MCP server cards, prioritize server name over version	2026-01-31 08:22:41 +01:00
Pascal	1615b1c58c	fix: responsive MCP server cards for mobile viewports	2026-01-31 07:58:47 +01:00
Pascal	cd8e5741f2	chore: update webui build output	2026-01-30 20:23:45 +01:00
Pascal	b872838329	webui: adaptive model selector dropdown width Make model selector dropdown responsive: - Mobile: full width (w-full max-w-[100vw]) - Desktop: adapts to longest model name (sm:w-max) - Replace TruncatedText with responsive span (truncate on mobile, full text on desktop via sm:overflow-visible sm:whitespace-nowrap) - Center status icons in fixed 24px wrapper to prevent layout shifts - Add sm:pr-2 padding between text and icon zone on desktop Fixes dropdown cutting off long model names on desktop while maintaining full-width display on mobile with proper text truncation	2026-01-30 20:21:05 +01:00
Aleksander Grygier	120ada3616	chore: update webui build output	2026-01-29 16:31:07 +01:00
Aleksander Grygier	e41f70bb47	refactor: Use CORS Proxy for favicons calls	2026-01-29 16:30:10 +01:00
Aleksander Grygier	46c5bca942	refactor: Proxy utility	2026-01-29 16:29:04 +01:00
Aleksander Grygier	944765138e	chore: update webui build output	2026-01-29 15:03:00 +01:00
Aleksander Grygier	536c6866e3	feat: Integrate with `llama-server` proxy + improve MCP Server Edit Form	2026-01-29 14:59:28 +01:00
Aleksander Grygier	406cb1dd99	Merge remote-tracking branch 'ngxson/xsn/cors_proxy_demo' into allozaur/mcp-mvp	2026-01-29 13:34:20 +01:00
Aleksander Grygier	9d6e210a5e	Merge remote-tracking branch 'ggml-org/master' into allozaur/mcp-mvp	2026-01-29 13:21:44 +01:00
Aleksander Grygier	7b00b46a6a	chore: update webui build output	2026-01-29 12:55:45 +01:00
Aleksander Grygier	6793c7daac	fix: Checking for capabilities from store	2026-01-29 12:45:10 +01:00
Aleksander Grygier	2aa704b821	refactor: Cleanup	2026-01-29 11:44:08 +01:00
yulo	f3dd7b8e68	HIP: add mmf for CDNA (#18896 ) * refactor mmf rows_per_block * speed up compile * pass cdna compile * fix cuda error * clean up mmf * f32 mmf * clean float mma * fix mmf error * faster mmf * extend tile k * fix compile error * Revert "extend tile k" This reverts commit `4d2ef3d483`. * fix smem overflow * speed up compiling mmf * speed up compile for hip * 512 block for cdna * config pad size * fix as comment * update select logic * move some code to cuh * fix as comment * correct cdna3 config --------- Co-authored-by: zhang hui <you@example.com>	2026-01-29 11:10:53 +01:00
Georgi Gerganov	eed25bc6b0	arg : add -kvu to llama-batched-bench (#19172 )	2026-01-29 08:50:47 +02:00
Vishal Singh	b33df266d0	ggml-zendnn : resolve ZenDNN backend cross-module symbol dependency (#19159 )	2026-01-29 12:28:57 +08:00
Aman Gupta	3bcc990997	CUDA: refactor topk-moe to enable more models (GLM 4.7, Nemotron etc.) (#19126 )	2026-01-29 10:31:28 +08:00
Neo Zhang	d4964a7c66	sycl: fix norm kernels: l2_norm, group_norm, rms_norm by remove assert to support more cases (#19154 ) Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2026-01-29 09:20:22 +08:00
Sigbjørn Skjæret	50e8962f79	ci : find latest release with asset for winget (#19161 )	2026-01-28 22:05:39 +01:00
Aleksander Grygier	c7b7fc6c15	chore: update webui build output	2026-01-28 19:57:18 +01:00
Aleksander Grygier	d9e82b7c29	fix: Linter errors	2026-01-28 19:55:44 +01:00
Ruben Ortlam	f6b533d898	Vulkan Flash Attention Coopmat1 Refactor (#19075 ) * vulkan: use coopmat for flash attention pv matrix multiplication fix P loading issue * fix barrier position * remove reduction that is no longer needed * move max thread reduction into loop * remove osh padding * add bounds checks and padding * remove unused code * fix shmem sizes, loop duration and accesses * don't overwrite Qf, add new shared psh buffer instead * add missing bounds checks * use subgroup reductions * optimize * move bounds check, reduce barriers * support other Bc values and other subgroup sizes * remove D_split * replace Of register array with shared memory Ofsh array * parallelize HSV across the rowgroups * go back to Of in registers, not shmem * vectorize sfsh * don't store entire K tile in shmem * fixes * load large k tiles to shmem on Nvidia * adapt shared memory host check function to shader changes * remove Bc 32 case * remove unused variable * fix missing mask reduction tmspsh barrier * fix mask bounds check * fix rowmax f16 under/overflow to inf * fix flash_attn_cm2 BLOCK_SIZE preprocessor directives	2026-01-28 18:52:45 +01:00
Sascha Rogmann	72d3b1898a	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 ) * server: introduce self-speculative decoding * server: moved self-call into speculative.cpp * can_speculate() includes self-speculation Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: can_speculate() tests self-spec * server: replace can_speculate() with slot.can_speculate() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * common: use %zu format specifier for size_t in logging Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * server: can_speculate() requires a task instance * common: ngram map, config self-speculative decoding * common: add enum common_speculative_type * common: add vector of speculative states * common: add option --spec-draftless * server: cleanup (remove slot.batch_spec, rename) * common: moved self-spec impl to ngram-map * common: cleanup (use common_speculative_state_draft) * spec : refactor * cont : naming * spec: remove --spec-config * doc: (draftless) speculative decoding * common: print performance in spec decoding * minor : cleanup * common : better names * minor : cleanup + fix build * minor: comments * CODEOWNERS: add common/ngram-map.* (#18471) * common : rename speculative.draftless_type -> speculative.type * ngram-map : fix uninitialized values * ngram-map : take into account the input can become shorter * ngram-map : revert len check for now * arg : change `--spec-draftless` -> `--spec-type` * spec : add common_speculative_state::accept() * spec : refactor + add common_speculative_begin() * spec : fix begin() call with mtmd * spec : additional refactor + remove common_speculative_params --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-28 19:42:42 +02:00
Aleksander Grygier	7c9be63a74	refactor: Refine Chat Message Processing State Display	2026-01-28 18:31:37 +01:00
Aleksander Grygier	5a176d1893	feat: Chat logic improvements	2026-01-28 18:31:37 +01:00
Aleksander Grygier	aa7089d598	feat: Integrate Resource Attachments into Chat Form UI	2026-01-28 18:31:37 +01:00
Aleksander Grygier	23e4ef7495	feat: MCP Resources UI feat: Implement MCP Resource Selection Dialog	2026-01-28 18:31:37 +01:00
Aleksander Grygier	1623547e2b	feat: Integrate Resource Store into Main MCP Store	2026-01-28 18:31:36 +01:00
Aleksander Grygier	dc2076a77c	feat: MCP Resources Svelte Store	2026-01-28 18:31:36 +01:00
Aleksander Grygier	192c920d73	refactor: Use constants	2026-01-28 18:31:35 +01:00
Aleksander Grygier	89166a79d4	feat: Introduce MCP Resource Types and Service Methods	2026-01-28 18:31:35 +01:00
Aleksander Grygier	85a61a7c96	refactor: Componentize HorizontalScrollCarousel	2026-01-28 17:32:59 +01:00
Aleksander Grygier	bfbcdc7420	fix: Code Preview sandbox	2026-01-28 17:31:04 +01:00
Daniel Bevenius	ebf5725870	convert : yield Mamba2Model/GraniteMoeModel modify_tensors (#19157 ) * convert : yield Mamba2Model/GraniteMoeModel modify_tensors This commit updates the `GraniteHybridModel` class' modify_tensors function to properly delegate to `Mamba2Model.modify_tensors` and `GraniteMoeModel.modify_tensors` using 'yield from' instead of 'return'. The motivation for this is that modify_tensors is a generator function (it uses 'yield from'), but the two calls above use return statements but don't yield anything which means that the the caller of this function will not receive any yielded values from it. And this causes layer tensors to be silently dropped during conversion.	2026-01-28 16:49:36 +01:00
Patryk Kaminski	0cd7032ca4	ggml-sycl: remove unused syclcompat header (#19140 ) The syclcompat/math.hpp is not used anymore. The change that intrduced it was successfuly reverted (https://github.com/ggml-org/llama.cpp/pull/17826). This include path will become obsolete and dropped in oneAPI 2026.0 effectively breaking ggml-sycl builds.	2026-01-28 23:33:54 +08:00
Sigbjørn Skjæret	60368e1d73	jinja : undefined should be treated as sequence/iterable (return string/array) by filters/tests (#19147 ) * undefined is treated as iterable (string/array) by filters `tojson` is not a supported `undefined` filter * add tests * add sequence and iterable tests keep it DRY and fix some types	2026-01-28 14:40:29 +01:00
Oleksandr Kuvshynov	88d23ad515	vulkan: handle device dedup on MacOS + Vega II Duo cards (#19058 ) Deduplication here relied on the fact that vulkan would return unique UUID for different physical GPUs. It is at the moment not always the case. On Mac Pro 2019 running Mac OS, with 2 Vega II Duo cards (so, 4 GPU total), MotlenVK would assign same UUID to pairs of GPUs, unless they are connected with Infinity Fabric. See more details here: KhronosGroup/MoltenVK#2683. The right way is to fix that in MoltenVK, but until it is fixed, llama.cpp would only recognize 2 of 4 GPUs in such configuration. The deduplication logic here is changed to only filter GPUs if UUID is same but driver is different.	2026-01-28 12:35:54 +01:00
Ben Chen	0a95026da9	doc: add build instruction to use Vulkan backend on macos (#19029 )	2026-01-28 12:30:16 +01:00
Kevin Pouget	b7feacf7f3	ggml: new backend for Virglrenderer API Remoting acceleration (v2) (#18718 )	2026-01-28 17:49:40 +08:00
Alberto Cabrera Pérez	6ad70c5a77	ggml-cpu: arm64: Q4_K scale unroll and vectorization (#19108 )	2026-01-28 09:15:56 +02:00
Georgi Gerganov	631cbfcc7a	cuda : fix "V is K view" check for non-unified KV cache (#19145 )	2026-01-28 09:15:27 +02:00
Georgi Gerganov	2eee6c866c	CUDA: tune GLM 4.7 Flash FA kernel selection logic (DGX Spark) (#19142 )	2026-01-28 09:15:11 +02:00
Georgi Gerganov	b931f81b5a	server : adjust spec tests to generate up to 16 tokens (#19093 )	2026-01-28 09:11:40 +02:00
Georgi Gerganov	c5c64f72ac	llama : disable Direct IO by default (#19109 ) * llama : disable Direct IO by default * cont : override mmap if supported	2026-01-28 09:11:13 +02:00

1 2 3 4 5 ...

8119 Commits All Branches Search

8119 Commits

All Branches