Commit Graph

23 Commits

Author SHA1 Message Date
bssrdf 80a996cfc0 WIP: tensore code compiled ok 2025-10-24 11:41:11 -04:00
bssrdf 2715341c1d WIP: output 2025-10-23 21:29:45 -04:00
bssrdf 66f6d16265 WIP 2025-10-23 13:52:26 -04:00
bssrdf 215ebf6526 WIP 2025-10-22 15:56:55 -04:00
bssrdf 1b69ed44c6 WIP 2025-10-21 17:15:26 -04:00
bssrdf f931ad883f WIP 2025-10-21 17:12:50 -04:00
bssrdf f0a480cc22 WIP 2025-10-21 15:43:35 -04:00
bssrdf 6a1f8b4d57 change padding size back to 4 2025-10-15 14:21:04 -04:00
bssrdf ac77b8d0e0 change padding size to 1; added padding to input smem 2025-10-15 14:07:24 -04:00
bssrdf 3f99818925 unroll some loops 2025-10-15 12:46:46 -04:00
bssrdf b70cca2ea3 add support for both NCHW and NHWC layouts 2025-10-14 14:24:35 -04:00
bssrdf 3e2f722d11 fixed missing dilation 2025-10-14 11:12:55 -04:00
bssrdf 2237722056 added block variants; to be debugged 2025-10-14 11:02:10 -04:00
bssrdf 16b0f0ae3c work in progress 2025-10-13 18:41:30 -04:00
bssrdf 0ca43582e8 reorder register tile loop 2025-10-08 13:52:56 -04:00
bssrdf 53a2ccbe12 minor update and add direct conv in benchmarking 2025-09-24 21:48:20 -04:00
bssrdf 83a3b7d6a9 Refactor conv2d_implicit_kernel for improved bitwise operations; add test for implicit convolution 2025-09-06 17:26:19 -04:00
bssrdf 4b0f9d571f Refactor conv2d_implicit_kernel for improved readability and consistency; update parameter comments and remove unused code 2025-09-05 08:29:57 -04:00
bssrdf 5ffe97be9c Fix boundary check in conv2d_implicit_kernel to include channel limits 2025-09-04 15:32:29 -04:00
bssrdf 6d84cbb5ab Fix parameter order in conv2d_implicit and add comprehensive test cases for 2D convolution 2025-09-03 15:45:09 -04:00
bssrdf 3877608dc0 fix passing param as reference 2025-09-03 12:45:19 -04:00
bssrdf 4d772873b9 Add implicit convolution support for 2D tensors in CPU and CUDA implementations 2025-09-03 11:29:14 -04:00
bssrdf 8a589317b6 Add implicit GEMM convolution operation for 2D tensors in CUDA 2025-09-02 22:47:41 -04:00