llama.cpp/ggml/src/ggml-musa
Johannes Gäßler 11f0af5504
CUDA: faster tile FA, add oob checks, more HSs (#16492)
2025-10-11 20:54:32 +02:00
..
CMakeLists.txt CUDA: faster tile FA, add oob checks, more HSs (#16492) 2025-10-11 20:54:32 +02:00
mudnn.cu musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy (#13647) 2025-05-21 09:58:49 +08:00
mudnn.cuh musa: enable fp16 mma (all) and cublas on qy2 (#13842) 2025-06-26 12:11:59 +08:00