bssrdf
|
5e1352cb60
|
add a case havign memory access violation
|
2025-11-10 15:33:32 -05:00 |
bssrdf
|
15daa5a6a8
|
added split-k mode to tensor core path
|
2025-11-10 14:38:23 -05:00 |
bssrdf
|
a428feecdd
|
fuse cast to float into conv epilogue; improve swizzling for output
|
2025-11-10 13:13:36 -05:00 |
bssrdf
|
2357922a2f
|
fixed a bug now all test cases passed
|
2025-11-03 08:46:17 -05:00 |
bssrdf
|
3308ccef91
|
conv3d WIP: enabled tensor core path
|
2025-11-02 17:30:36 -05:00 |
bssrdf
|
3f5c5045da
|
conv3d WIP: turn on tensor cores; NCDHW2NDHWC to be worked out
|
2025-11-02 15:15:49 -05:00 |
bssrdf
|
a5b68bcea7
|
conv3D WIP: fixed a launch param bug, results now correct; performace 3x slower than im2col
|
2025-11-02 12:33:19 -05:00 |
bssrdf
|
0a64ea8ff8
|
WIP: build ok
|
2025-11-02 10:34:03 -05:00 |
bssrdf
|
52455b8a6d
|
WIP: updating indices for input and kernel; enable OP_CONV_3D for cuda backend
|
2025-11-01 22:01:00 -04:00 |
bssrdf
|
ab15f6cd5f
|
use conv2d_implicit as template; add conv3d parameters
|
2025-11-01 20:08:15 -04:00 |