Gaurav Garg
|
aa8b62105c
|
Support device-specific host buffer types if all underlying backends expose the same type. This allows using pinned memory instead of pageable memory for CUDA.
Fix compilation errors.
|
2026-02-16 15:39:26 +05:30 |
Johannes Gäßler
|
98ab6727e4
|
arbitrary num. of GPUs/tensor split
|
2026-02-13 14:10:50 +01:00 |
Johannes Gäßler
|
9c7d45c0fc
|
fix view_offs scaling
|
2026-02-13 11:22:34 +01:00 |
Johannes Gäßler
|
31e4f189bb
|
support for tensor dims % n_devs != 0
|
2026-02-13 00:40:00 +01:00 |
Johannes Gäßler
|
3fdd0b7a6e
|
2d tensor set/get support
|
2026-02-11 19:56:35 +01:00 |
Johannes Gäßler
|
4dc3d10e80
|
Remove shfl and AllReduce from backend interface
|
2026-02-11 14:51:37 +01:00 |
Johannes Gäßler
|
8de41b5b40
|
NCCL support
|
2026-02-11 14:12:33 +01:00 |
Johannes Gäßler
|
c531444411
|
fix output pattern
|
2026-02-11 14:12:33 +01:00 |
Johannes Gäßler
|
c925563499
|
re-use buffers + ggml contexts
|
2026-02-11 14:12:33 +01:00 |
Johannes Gäßler
|
2ffa49decc
|
add support for 4/8 GPUs
|
2026-02-11 14:12:33 +01:00 |
Johannes Gäßler
|
4b8aa26650
|
partial Vulkan fix
|
2026-02-11 14:12:33 +01:00 |
Johannes Gäßler
|
ab69c58aaa
|
support for GPT-OSS, Qwen 3 MoE
|
2026-02-11 14:12:33 +01:00 |
Johannes Gäßler
|
a0d9dd20ee
|
ggml: backend-agnostic tensor parallelism
|
2026-02-11 14:12:33 +01:00 |