llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aldehir Rojas	c96f608d98	common: consolidate PEG string parsers (#20263 ) * common : consolidate PEG string parsers * cont : fix json_string_content()	2026-03-10 00:29:21 +01:00
Xuan-Son Nguyen	0842b9b465	model: fix step3.5 n_rot (#20318 )	2026-03-09 23:42:24 +01:00
Xuan-Son Nguyen	59db9a357d	llama: dynamic head_dim and n_rot for SWA (#20301 ) * llama: dynamic head_dim and n_rot for SWA * also add gguf_writer wrappers * fix build * build_rope_shift arg reorder	2026-03-09 22:22:39 +01:00
Evan Huus	23fbfcb1ad	server: Parse port numbers from MCP server URLs in CORS proxy (#20208 ) * Parse port numbers from MCP server URLs * Pass scheme to http proxy for determining whether to use SSL * Fix download on non-standard port and re-add port to logging * add test --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2026-03-09 17:47:54 +01:00
Paul Flynn	e22cd0aa15	metal : extend mul_mv_ext to BF16, Q2_K, Q3_K (#20250 ) Enable mul_mv_ext small-batch kernels (BS 2-8) for BF16, Q2_K, and Q3_K quantization types. These types previously fell through to the slower single-row mul_mv path. BF16 uses the float4 dequantize path (like F16). Q2_K and Q3_K use the float4x4 K-quant path (like Q4_K/Q5_K/Q6_K). Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-09 16:48:12 +02:00
Georgi Gerganov	96cfc4992c	server : fix checkpoints n_tokens calculation (#20287 )	2026-03-09 16:47:06 +02:00
Georgi Gerganov	ed0007aa32	metal : add upscale (#20284 )	2026-03-09 16:45:11 +02:00
Georgi Gerganov	344ee2a38a	server : warn swa-full is not supported for non-SWA models (#20291 )	2026-03-09 16:44:25 +02:00
Georgi Gerganov	d6e1556499	server : fix off-by-1 in server_tokens::size_up_to_pos() (#20279 ) * server : fix off-by-1 in server_tokens::size_up_to_pos() * cont : fix typo [no ci]	2026-03-09 16:43:38 +02:00
Piotr Wilkin (ilintar)	f76565db92	common: map developer role to system (#20215 ) * Map developer role to system * Simplify	2026-03-09 14:25:11 +01:00
Georgi Gerganov	43e1cbd6c1	models : fix assert in mamba2 graph (#20270 )	2026-03-09 13:15:15 +02:00
Georgi Gerganov	107d599952	server : add kill switch when server is stuck (#20277 )	2026-03-09 10:33:12 +02:00
Aman Gupta	e8bbc736cb	ggml-cuda: disable gdn for musa (#20278 )	2026-03-09 16:15:36 +08:00
ddh0	b518195101	llama-quant : left-align tensor names in output (#20117 )	2026-03-09 09:28:41 +02:00
Aman Gupta	e2763a6723	contributing: limit open PRs for new contributors to 1 (#20036 )	2026-03-09 15:05:34 +08:00
Bertay Eren	0beb8db3a0	ggml-vulkan: add SGN operator, auto-generate Vulkan.csv and ops.md (#20219 )	2026-03-09 07:24:16 +01:00
Ruben Ortlam	b2f460bd3c	vulkan: skip zero size tensors in backend copies (#20233 )	2026-03-09 07:23:45 +01:00
Michael Huang	5f4cdac385	cuda : display total and free VRAM capacity during device initialization (#20185 )	2026-03-09 12:45:43 +08:00
Aaron Teo	ae87863dc1	llama-bench: introduce `-hf` and `-hff` flags & use `--mmap 1` by default (#20211 )	2026-03-09 09:05:44 +08:00
Piotr Wilkin (ilintar)	97c64fbdbd	PEG parser for LFM2 (#20251 ) * PEG parser for LFM2 * Simplify using python_value()	2026-03-09 01:11:22 +01:00
Georgi Gerganov	d417bc43dd	server : do not create checkpoints right after mtmd chunks (#20232 )	2026-03-08 22:16:46 +02:00
Sigbjørn Skjæret	35bee031e1	graph : remove redundant scale_w parameter (#20235 )	2026-03-08 18:58:28 +01:00
Aldehir Rojas	451ef08432	common : gracefully handle incomplete output (#20191 ) * common : handle incomplete UTF-8 at end of input in PEG parser * cont : if reached end prematurely, emit needs_more_input to propagate partial output * cont: refactor peg parse context to add lenient flag * cont : remove partial flag, keep lenient flag	2026-03-08 17:17:02 +01:00
Piotr Wilkin (ilintar)	9b24886f78	Fix compile bug (#20203 ) * Fix compile bug * Update common/chat-auto-parser-helpers.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-03-08 17:15:49 +01:00
Piotr Wilkin (ilintar)	62b8143ad2	Fix structured outputs (#20223 ) * Fix structured outputs * Update common/chat-auto-parser-generator.cpp Co-authored-by: Aldehir Rojas <hello@alde.dev> --------- Co-authored-by: Aldehir Rojas <hello@alde.dev>	2026-03-08 17:14:43 +01:00
GiantPrince	d088d5b74f	ggml-vulkan: Add ELU op support (#20183 ) * ggml-Vulkan: add ELU support * ggml-Vulkan: remove extra spaces and variables * ggml-Vulkan: fix format issue * ggml-Vulkan: fix format issue * fix whitespace issue * Update Vulkan.csv and ops.md	2026-03-08 12:38:17 +01:00
Jeff Bolz	cd18a50ea5	vulkan: Fix data races in coopmat1 mul_mat(_id) (#20084 ) * vulkan: Fix data races in coopmat1 mul_mat(_id) Add barriers between coopmat store and regular loads. We sort of got away with this because it was the same subgroup accessing the values, but it's still a race and may not work. * switch to subgroup control barriers	2026-03-08 12:33:48 +01:00
Johannes Gäßler	a976ff081b	llama: end-to-end tests (#19802 ) * tests: add end-to-end tests per model architecture * fixup for rebase * fix use-after-free in llama-model-loader.cpp * fix CI * fix WebGPU * fix CI * disable CI for macOS-latest-cmake-arm64 * use expert_weights_scale only if != 0.0f * comments	2026-03-08 12:30:21 +01:00
Christopher Maher	a95047979a	readme : update infra list (#20212 )	2026-03-08 12:42:28 +02:00
Piotr Wilkin (ilintar)	b283f6d5b3	Revert to OAI-compatible args (#20213 ) * Revert to OAI-compatible args * Apply workaround::func_args_not_string	2026-03-08 11:33:03 +01:00
decahedron1	ff52ee964d	server : correct index on finish in OAI completion streams (#20226 )	2026-03-08 10:08:57 +01:00
Neo Zhang	213c4a0b81	[SYCL] supprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (#20190 ) * support flash-attention for fp32/fp16/Q4/Q5/Q8 * rm warining * update for JIT	2026-03-08 12:00:07 +08:00
Aman Gupta	c5a778891b	ggml: add GATED_DELTA_NET op (#19504 ) * ggml: add GATED_DELTA_NET op * remove the transpose * add KDA * add qwen35 dense * llama : check for fused gated delta net backend support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-03-07 15:41:10 +08:00
lhez	6fce5c6a7d	opencl: add l2_norm (#20160 )	2026-03-06 18:03:05 -08:00
Piotr Wilkin (ilintar)	c024d85908	Autoparser: True streaming (#20177 ) * Relax atomicity constraint for nicer, more pleasent, True Streaming parsing * Whitespace * Remove redundant atomics	2026-03-07 01:55:33 +01:00
Piotr Wilkin (ilintar)	2f2923f895	Autoparser: add optional argument reshuffle capability (#20171 ) * Allow reshuffled arguments in tagged argument parser format tool calls. * Remove shuffle just keep the optional parsers in any order * Remove unnecessary import	2026-03-06 22:34:15 +01:00
Bartowski	649f06481e	quants : Add memsets and other fixes for IQ quants (#19861 ) * Add memsets and other fixes for IQ quants * Make memset unconditional, change Laux back to L * Move another memset	2026-03-06 23:06:56 +02:00
Piotr Wilkin (ilintar)	7463687161	Add @pwilkin to CODEOWNERS for autoparser code (#20174 )	2026-03-06 21:25:41 +01:00
Piotr Wilkin (ilintar)	566059a26b	Autoparser - complete refactoring of parser architecture (#18675 ) * Autoparser - full single commit squish * Final pre-merge changes: minor fixes, Kimi 2.5 model parser	2026-03-06 21:01:00 +01:00
Todor Boinovski	34df42f7be	hexagon: add f32 ssm_conv op (#20122 ) * hexagon: add ssm_conv op * hexagon: hvx kernel is functional * hexagon: improvements to ssm-conv hvx kernel * hexagon: added dma to ssm-conv hvx kernel * hexagon: ssm-conv dynamically compute gather scratchpad * hex-ssm-conv: add local context and fix various issues (spad indexing, etc) --------- Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>	2026-03-06 09:59:26 -08:00
Tom Vaucourt	e68f2fb894	server : preserve anthropic thinking blocks in conversion (#20120 ) * server : preserve anthropic thinking blocks in conversion (#20090) * server : add tests for anthropic thinking block conversion --------- Co-authored-by: root <root@llamacpp.home>	2026-03-06 17:41:12 +01:00
Max Krasnyansky	ba2fd11cdf	cpu: skip redudant ROPE cache updates (#20149 )	2026-03-06 08:32:40 -08:00
Aman Gupta	d48e876467	ggml-cuda: add mem check for fusion (#19916 ) * ggml-cuda: add mem check for fusion * Replace NaNs with -FLT_MAX * fix typo Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2026-03-07 00:05:43 +08:00
Aaron Teo	ba2ff79e43	ggml: update comments for backends which have no memory to report (#20157 ) Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2026-03-06 23:24:38 +08:00
shalinib-ibm	c6980ff29d	ggml-cpu: Fix gcc 15 ICE on ppc64le (#20083 ) (#20130 ) This patch addresses an Internal Compiler Error (Segmentation fault) observed with gcc 15 by replacing the intrinsic + cast by doing a cat on the data first and then calling the intrinsic. This bypasses the buggy compiler path while maintaining identical instruction selection. Performance Verification: Assembly analysis on RHEL 9 (GCC 15.1.1) confirms that both the original code and this fix generate the identical Power10 prefixed load instruction: `plxv 40, 2(14)` This ensures zero performance regression while unblocking builds on newer toolchains. Reproduced on: - Alpine Linux + GCC 15.2.0-r2 - RHEL 9 + GCC 15.1.1 (gcc-toolset-15) Signed-off-by: Shalini Salomi Bodapati <Shalini.Salomi.Bodapati@ibm.com>	2026-03-06 23:22:39 +08:00
Aman Gupta	1e38a7a6fa	CUDA: use shared mem for ssm_conv (#20128 ) * CUDA: use shared mem for ssm_conv * fuse silu + ssm_conv * fuse unary + mul * enable for fp16 * formatting Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2026-03-06 23:09:59 +08:00
Tim Neumann	388baabc06	context: ignore zero scale LoRAs when checking sameness (#20166 )	2026-03-06 15:05:52 +02:00
Piotr Wilkin (ilintar)	f5ddcd1696	Checkpoint every n tokens: squash (#20087 )	2026-03-06 11:39:26 +01:00
Aleksander Grygier	f6235a41ef	webui: Agentic Loop + MCP Client with support for Tools, Resources and Prompts (#18655 )	2026-03-06 10:00:39 +01:00
Johannes Gäßler	2850bc6a13	ggml-cpu: fix data race for debug asserts (#20148 )	2026-03-06 09:12:49 +01:00

1 2 3 4 5 ...

8265 Commits All Branches Search

8265 Commits

All Branches