llama.cpp

Commit Graph

Author	SHA1	Message	Date
Reese Levine	8ced5f41f9	Move to no timeout for WaitAny in graph submission to avoid deadlocks in some cases on llvm-pipe backends (#20618 )	2026-03-18 10:23:47 -07:00
Shaw Nguyen	78d550b541	ggml-cpu/x86: fix unused changemask warning in repack (#20692 )	2026-03-18 18:45:06 +02:00
Georgi Gerganov	4efd326e71	sync : ggml	2026-03-18 15:17:28 +02:00
Georgi Gerganov	b08f7322ee	ggml : bump version to 0.9.8 (ggml/1442)	2026-03-18 15:17:28 +02:00
Georgi Gerganov	79187f2fb8	ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441)	2026-03-18 15:17:28 +02:00
Julien Chaumond	48e61238e1	webui: improve tooltip wording for attachment requirements (#20688 ) * webui: improve tooltip wording for attachment requirements Co-Authored-By: Claude <Agents+claude@huggingface.co> * chore: update webui build output * chore: update webui build output --------- Co-authored-by: Claude <Agents+claude@huggingface.co>	2026-03-18 14:01:02 +01:00
Pop Flamingo	312cf03328	llama : re-enable manual LoRA adapter free (#19983 ) * Re-enable manual LoRA adapter free * Remove stale "all adapters must be loaded before context creation" stale comments	2026-03-18 12:03:26 +02:00
Masato Nakasaka	f4049ad735	tests : fix test-jinja-py Windows failures by bypassing command-line args [no ci] (#20483 ) * Fix errors occurring on Windows * Reverted fix #20365 will take care of CRLF isue * Changed to write to directly to stdin * Prevent fclose to happen twice	2026-03-18 10:43:31 +01:00
Aldehir Rojas	5e8910a0db	common : rework gpt-oss parser (#20393 ) * common : rework gpt-oss parser * cont : fix gpt-oss tests * cont : add structured output test * cont : rename final to final_msg	2026-03-18 10:41:25 +01:00
Aaron Teo	fe00a84b4b	tests: enable kv_unified to prevent cuda oom error on rtx 2060 (#20645 ) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2026-03-18 17:40:22 +08:00
Aleksander Grygier	7ab321d40d	webui: Fix duplicated messages on q param (#20715 ) * fix: Remove duplicate message sending on `?q` param * chore: update webui build output	2026-03-18 10:32:43 +01:00
uvos	7533a7d509	HIP : ignore return of hipMemAdvise [no ci] (#20696 )	2026-03-18 09:53:13 +01:00
Andreas Obersteiner	a69d54f990	context : fix graph not resetting when control vector changes (#20381 )	2026-03-18 08:10:13 +02:00
Krishna Sridhar	cf23ee2447	hexagon: add neg, exp, sigmoid, softplus ops, cont, repeat ops (#20701 ) Add element-wise unary ops needed by Qwen 3.5's DeltaNet linear attention layers. These ops follow the existing unary-ops pattern with VTCM DMA double-buffering. - neg: negate via scale by -1.0 - exp: uses existing hvx_exp_f32 HVX intrinsics - sigmoid: uses existing hvx_sigmoid_f32_aa HVX intrinsics - softplus: log(1 + exp(x)) scalar fallback - CONT reuses the existing CPY infrastructure since making a tensor contiguous is equivalent to a same-type copy. - REPEAT implements tiled memory copy with multi-threaded execution via the worker pool, supporting f32 and f16 types. The kernel parallelizes across output rows and uses memcpy for each tile. Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>	2026-03-17 15:34:36 -07:00
Ruben Ortlam	892e3c333a	vulkan: disable mmvq on Intel Windows driver (#20672 ) * vulkan: disable mmvq on Intel Windows driver * improve comment	2026-03-17 21:51:43 +01:00
Kevin Hannon	ee4801e5a6	ggml-blas: set mkl threads from thread context (#20602 ) * ggml blas: set mkl threads from thread context * add code to run blas locally	2026-03-18 01:16:49 +08:00
Piotr Wilkin (ilintar)	d2ecd2d1cf	common/parser: add `--skip-chat-parsing` to force a pure content parser. (#20289 ) * Add `--force-pure-content` to force a pure content parser. * Update common/arg.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Change parameter name [no ci] --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-17 16:16:43 +01:00
Taimur Ahmad	054d8b0f24	ggml-cpu: fix RVV checks in quants and repacking (#20682 ) * ggml-cpu: refactor quants.c; add rvv check * ggml-cpu: refactor; disable generic fallback	2026-03-17 16:03:40 +02:00
Sigbjørn Skjæret	ab0bb93748	ci : bump ccache [no ci] (#20679 ) * bump ccache * forgotten * disable for s390x * disable also for ppc64le	2026-03-17 14:54:31 +01:00
Ruben Ortlam	3a5cb629b1	vulkan: async and event fixes (#20518 ) * vulkan: fix event wait submission, event command buffer reset * fix event command buffer reset validation error * also reset command buffers before reuse * use timeline semaphores instead of fences for event_synchronize * don't use initializer list for semaphore wait info * use multiple events to avoid reset issues * fix event reuse issue with multiple vectors * add semaphore wait condition also if compute_ctx already exists * remove event pending stage	2026-03-17 14:27:23 +01:00
Georgi Gerganov	8cc2d81264	server : fix ctx checkpoint invalidation (#20671 )	2026-03-17 15:21:14 +02:00
Justin Bradford	627670601a	kleidiai : fix MUL_MAT support for batched (3D) inputs (#20620 ) * kleidiai : fix MUL_MAT support for batched (3D) inputs The supports_op() check incorrectly rejected MUL_MAT operations with 3D inputs (ne[2] > 1), but the actual compute_forward_qx() implementation handles batched inputs correctly via a loop over ne12. This caused models with Q4_0/Q8_0 weights to crash during graph scheduling when n_seq_max > 1, because weights were placed in KLEIDIAI buffers during loading (tested with 2D inputs) but the runtime used 3D inputs. Also relax the buffer check to allow supports_op() to be called during weight loading when src[0]->buffer is NULL. Fixes #20608 * Kleidiai support_ops should only return true for 3D inputs, not also 4D	2026-03-17 14:03:54 +02:00
Ruben Ortlam	740a447fc3	vulkan: allow graphics queue only through env var (#20599 ) * vulkan: avoid graphics queue on non-RADV AMD drivers * avoid graphics queues on small GPUs * change to only use graphics queue if overridden with env var GGML_VK_ALLOW_GRAPHICS_QUEUE * reenable transfer queue if graphics queue is not used	2026-03-17 10:09:59 +01:00
Neo Zhang	b6c83aad55	[SYCL] ehance UPSCALE to support all UT cases (#20637 ) * [SYCL] ehance UPSCALE to support more cases * rm test case result of SYCL1	2026-03-17 10:01:52 +08:00
Piotr Wilkin (ilintar)	2e4a6edd4a	tools/server: support refusal content for Responses API (#20285 ) * Support refusal content for Responses API * Update tools/server/server-common.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update tools/server/server-common.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-17 01:42:04 +01:00
Xuan-Son Nguyen	d34ff7eb5b	model: mistral small 4 support (#20649 ) * model: mistral small 4 support * fix test * fix test (2) * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * change newline --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-17 00:31:14 +01:00
Georgi Gerganov	45172df4d6	ci : disable AMX jobs (#20654 ) [no ci]	2026-03-16 22:38:59 +02:00
Georgi Gerganov	9b342d0a9f	benches : add Nemotron 3 Nano on DGX Spark (#20652 ) [no ci]	2026-03-16 21:50:43 +02:00
Sigbjørn Skjæret	55e87026f7	tests : write to binary buffer to avoid newline translation in jinja -py [no ci] (#20365 )	2026-03-16 20:40:22 +01:00
Martin Klacer	cf21cdf36c	kleidiai: add data type check to get_tensor_traits (#20639 ) * kleidiai: add data type check to get_tensor_traits * Added check for F16 data type into get_tensor_traits path with input data not in ggml_backend_cpu_kleidiai_buffer_type format (unsupported for Q4/8) Signed-off-by: Martin Klacer <martin.klacer@arm.com> Change-Id: I9aca4b9b8d669d35db6f1dbcc4e080b1919b1de7 * updated ggml/src/ggml-cpu/kleidiai/kleidiai.cpp updated kleidiai.cpp file as per suggestion Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Signed-off-by: Martin Klacer <martin.klacer@arm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-03-16 21:25:54 +02:00
Sigbjørn Skjæret	0ed992973b	ci : update labeler (#20629 )	2026-03-16 20:24:20 +01:00
Aldehir Rojas	1bbec6a75d	jinja : add capability check for object args (#20612 )	2026-03-16 17:43:14 +01:00
Georgi Gerganov	f47a246a08	sync : ggml	2026-03-16 17:22:06 +02:00
Georgi Gerganov	c0ccbd1f86	ggml : try fix arm build (whisper/0)	2026-03-16 17:22:06 +02:00
David366AI	f6da02c3f2	ggml : extend im2col f16 (ggml/1434) * examples/yolo: fix load_model memory leak * fix/issue-1433 ggml_compute_forward_im2col_f16 assert error * fix/issue-1433	2026-03-16 17:22:06 +02:00
Pascal	dddca026bf	webui: add model information dialog to router mode (#20600 ) * webui: add model information dialog to router mode * webui: add "Available models" section header in model list * webui: remove nested scrollbar from chat template in model info dialog * chore: update webui build output * feat: UI improvements * refactor: Cleaner rendering + UI docs * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2026-03-16 15:38:11 +01:00
Aman Gupta	3c8521c4f5	llama-graph: replace cont with reshape for alpha in qwen35 (#20640 )	2026-03-16 22:07:13 +08:00
Aleksander Grygier	67a2209fab	webui: Add MCP CORS Proxy detection logic & UI (#20167 ) * refactor: MCP store cleanup * feat: Add MCP proxy availability detection * fix: Sidebar icon * chore: update webui build output * chore: Formatting * chore: update webui build output * chore: Update package lock * chore: update webui build output * chore: update webui build output * chore: update webui build output	2026-03-16 13:05:36 +01:00
Pascal	d65c4f2dc9	Fix model selector locked to first loaded model with multiple models (#20580 ) * webui: fix model selector being locked to first loaded model When multiple models are loaded, the auto-select effect would re-fire on every loadedModelIds change, overriding the user's manual model selection. Guard with selectedModelId so auto-select only kicks in when no model is chosen yet. * chore: update webui build output	2026-03-16 12:04:06 +01:00
Woof Dog	d8c331c0af	webui: use date in more human readable exported filename (#19939 ) * webui: use date in exported filename Move conversation naming and export to utils update index.html.gz * webui: move literals to message export constants file * webui: move export naming and download back to the conversation store * chore: update webui build output * webui: add comments to some constants * chore: update webui build output	2026-03-16 11:18:13 +01:00
Ruben Ortlam	46dba9fce8	vulkan: fix flash attention dot product precision (#20589 )	2026-03-16 10:45:49 +01:00
Sigbjørn Skjæret	de8f01c2d7	model : wire up Nemotron-H tensors for NVFP4 support (#20561 ) * wire up Nemotron-H tensors for NVFP4 support * add ssm tensors * alignment	2026-03-16 09:19:16 +01:00
Richard Davison	079e5a45f0	convert : support mixed-precision ModelOpt models with per-tensor NVFP4/FP8 quantization (#20539 ) * support mixed-precision ModelOpt models with per-tensor NVFP4/FP8 quantization * cleanup * fallback --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-16 09:18:47 +01:00
Masato Nakasaka	d3936498a3	common : fix iterator::end() dereference (#20445 )	2026-03-16 08:50:38 +02:00
Aman Gupta	34818ea6c0	CUDA: GDN hide memory latency (#20537 )	2026-03-16 11:41:45 +08:00
Piotr Wilkin (ilintar)	9e2e2198b0	tools/cli: fix disable reasoning (#20606 )	2026-03-15 22:40:53 +01:00
Georgi Gerganov	88915cb55c	server : fix wait in test_cancel_requests() test (#20601 ) * server : fix wait in test_cancel_requests() test * codeowners : add team for server tests	2026-03-15 20:54:37 +02:00
Sigbjørn Skjæret	ebbf544ed1	sycl : fix for untransposed GDA recurrent state (#20583 )	2026-03-15 19:10:15 +01:00
Sigbjørn Skjæret	b91d7dfe5b	ci : only save openvino caches on github-hosted master (#20593 ) * only save openvino ccache on master * disable toolkit cache if self-hosted * only cache on github-hosted runners * remove toolkit cache [no ci]	2026-03-15 18:58:13 +01:00
Johannes Gäßler	ae40cd27c8	CUDA: limit number of FA stream-k CUDA blocks (#20586 )	2026-03-15 18:30:47 +01:00

1 2 3 4 5 ...

8413 Commits All Branches Search

8413 Commits

All Branches