Commit Graph

15 Commits

Author SHA1 Message Date
bssrdf c68fe36ae2 WIP: cleanup; enhanced test case 2025-10-25 21:57:39 -04:00
bssrdf 475f9879c5 WIP: fixed another bug 2025-10-25 20:24:14 -04:00
bssrdf 396f55831c WIP: bug fix 2025-10-25 18:14:12 -04:00
bssrdf 610e41ae2d still debugging 2025-10-25 11:10:39 -04:00
bssrdf c45df12ee7 this case is broken; to be debugged 2025-10-24 22:40:34 -04:00
bssrdf 980ddc1e87 properly use __CUDA_ARCH__ to protect the tensor path 2025-10-24 21:56:58 -04:00
bssrdf 24b553204b WIP: fixed another bug 2025-10-24 16:53:40 -04:00
bssrdf 6c90c20cb1 WIP: bug fix 2025-10-24 15:33:57 -04:00
bssrdf be25be8ed3 WIP: debugging tensor core kernel 2025-10-24 14:24:26 -04:00
bssrdf 3f99818925 unroll some loops 2025-10-15 12:46:46 -04:00
bssrdf b70cca2ea3 add support for both NCHW and NHWC layouts 2025-10-14 14:24:35 -04:00
bssrdf 2237722056 added block variants; to be debugged 2025-10-14 11:02:10 -04:00
bssrdf c6255442bb minor updates 2025-10-08 13:38:16 -04:00
bssrdf 53a2ccbe12 minor update and add direct conv in benchmarking 2025-09-24 21:48:20 -04:00
bssrdf 83a3b7d6a9 Refactor conv2d_implicit_kernel for improved bitwise operations; add test for implicit convolution 2025-09-06 17:26:19 -04:00