bssrdf
|
be25be8ed3
|
WIP: debugging tensor core kernel
|
2025-10-24 14:24:26 -04:00 |
bssrdf
|
3f99818925
|
unroll some loops
|
2025-10-15 12:46:46 -04:00 |
bssrdf
|
b70cca2ea3
|
add support for both NCHW and NHWC layouts
|
2025-10-14 14:24:35 -04:00 |
bssrdf
|
2237722056
|
added block variants; to be debugged
|
2025-10-14 11:02:10 -04:00 |
bssrdf
|
c6255442bb
|
minor updates
|
2025-10-08 13:38:16 -04:00 |
bssrdf
|
53a2ccbe12
|
minor update and add direct conv in benchmarking
|
2025-09-24 21:48:20 -04:00 |
bssrdf
|
83a3b7d6a9
|
Refactor conv2d_implicit_kernel for improved bitwise operations; add test for implicit convolution
|
2025-09-06 17:26:19 -04:00 |