llama.cpp

Author	SHA1	Message	Date
Gaurav Garg	aa8b62105c	Support device-specific host buffer types if all underlying backends expose the same type. This allows using pinned memory instead of pageable memory for CUDA. Fix compilation errors.	2026-02-16 15:39:26 +05:30
Johannes Gäßler	98ab6727e4	arbitrary num. of GPUs/tensor split	2026-02-13 14:10:50 +01:00
Johannes Gäßler	9c7d45c0fc	fix view_offs scaling	2026-02-13 11:22:34 +01:00
Johannes Gäßler	31e4f189bb	support for tensor dims % n_devs != 0	2026-02-13 00:40:00 +01:00
Johannes Gäßler	3fdd0b7a6e	2d tensor set/get support	2026-02-11 19:56:35 +01:00
Johannes Gäßler	4dc3d10e80	Remove shfl and AllReduce from backend interface	2026-02-11 14:51:37 +01:00
Johannes Gäßler	8de41b5b40	NCCL support	2026-02-11 14:12:33 +01:00
Johannes Gäßler	c531444411	fix output pattern	2026-02-11 14:12:33 +01:00
Johannes Gäßler	c925563499	re-use buffers + ggml contexts	2026-02-11 14:12:33 +01:00
Johannes Gäßler	2ffa49decc	add support for 4/8 GPUs	2026-02-11 14:12:33 +01:00
Johannes Gäßler	4b8aa26650	partial Vulkan fix	2026-02-11 14:12:33 +01:00
Johannes Gäßler	ab69c58aaa	support for GPT-OSS, Qwen 3 MoE	2026-02-11 14:12:33 +01:00
Johannes Gäßler	a0d9dd20ee	ggml: backend-agnostic tensor parallelism	2026-02-11 14:12:33 +01:00