Commit Graph

13 Commits

Author SHA1 Message Date
Gaurav Garg aa8b62105c Support device-specific host buffer types if all underlying backends expose the same type. This allows using pinned memory instead of pageable memory for CUDA.
Fix compilation errors.
2026-02-16 15:39:26 +05:30
Johannes Gäßler 98ab6727e4 arbitrary num. of GPUs/tensor split 2026-02-13 14:10:50 +01:00
Johannes Gäßler 9c7d45c0fc fix view_offs scaling 2026-02-13 11:22:34 +01:00
Johannes Gäßler 31e4f189bb support for tensor dims % n_devs != 0 2026-02-13 00:40:00 +01:00
Johannes Gäßler 3fdd0b7a6e 2d tensor set/get support 2026-02-11 19:56:35 +01:00
Johannes Gäßler 4dc3d10e80 Remove shfl and AllReduce from backend interface 2026-02-11 14:51:37 +01:00
Johannes Gäßler 8de41b5b40 NCCL support 2026-02-11 14:12:33 +01:00
Johannes Gäßler c531444411 fix output pattern 2026-02-11 14:12:33 +01:00
Johannes Gäßler c925563499 re-use buffers + ggml contexts 2026-02-11 14:12:33 +01:00
Johannes Gäßler 2ffa49decc add support for 4/8 GPUs 2026-02-11 14:12:33 +01:00
Johannes Gäßler 4b8aa26650 partial Vulkan fix 2026-02-11 14:12:33 +01:00
Johannes Gäßler ab69c58aaa support for GPT-OSS, Qwen 3 MoE 2026-02-11 14:12:33 +01:00
Johannes Gäßler a0d9dd20ee ggml: backend-agnostic tensor parallelism 2026-02-11 14:12:33 +01:00