bssrdf
|
80a996cfc0
|
WIP: tensore code compiled ok
|
2025-10-24 11:41:11 -04:00 |
bssrdf
|
2715341c1d
|
WIP: output
|
2025-10-23 21:29:45 -04:00 |
bssrdf
|
66f6d16265
|
WIP
|
2025-10-23 13:52:26 -04:00 |
bssrdf
|
215ebf6526
|
WIP
|
2025-10-22 15:56:55 -04:00 |
bssrdf
|
1b69ed44c6
|
WIP
|
2025-10-21 17:15:26 -04:00 |
bssrdf
|
f931ad883f
|
WIP
|
2025-10-21 17:12:50 -04:00 |
bssrdf
|
f0a480cc22
|
WIP
|
2025-10-21 15:43:35 -04:00 |
bssrdf
|
6a1f8b4d57
|
change padding size back to 4
|
2025-10-15 14:21:04 -04:00 |
bssrdf
|
ac77b8d0e0
|
change padding size to 1; added padding to input smem
|
2025-10-15 14:07:24 -04:00 |
bssrdf
|
3f99818925
|
unroll some loops
|
2025-10-15 12:46:46 -04:00 |
bssrdf
|
b70cca2ea3
|
add support for both NCHW and NHWC layouts
|
2025-10-14 14:24:35 -04:00 |
bssrdf
|
3e2f722d11
|
fixed missing dilation
|
2025-10-14 11:12:55 -04:00 |
bssrdf
|
2237722056
|
added block variants; to be debugged
|
2025-10-14 11:02:10 -04:00 |
bssrdf
|
16b0f0ae3c
|
work in progress
|
2025-10-13 18:41:30 -04:00 |
bssrdf
|
0ca43582e8
|
reorder register tile loop
|
2025-10-08 13:52:56 -04:00 |
bssrdf
|
53a2ccbe12
|
minor update and add direct conv in benchmarking
|
2025-09-24 21:48:20 -04:00 |
bssrdf
|
83a3b7d6a9
|
Refactor conv2d_implicit_kernel for improved bitwise operations; add test for implicit convolution
|
2025-09-06 17:26:19 -04:00 |
bssrdf
|
4b0f9d571f
|
Refactor conv2d_implicit_kernel for improved readability and consistency; update parameter comments and remove unused code
|
2025-09-05 08:29:57 -04:00 |
bssrdf
|
5ffe97be9c
|
Fix boundary check in conv2d_implicit_kernel to include channel limits
|
2025-09-04 15:32:29 -04:00 |
bssrdf
|
6d84cbb5ab
|
Fix parameter order in conv2d_implicit and add comprehensive test cases for 2D convolution
|
2025-09-03 15:45:09 -04:00 |
bssrdf
|
3877608dc0
|
fix passing param as reference
|
2025-09-03 12:45:19 -04:00 |
bssrdf
|
4d772873b9
|
Add implicit convolution support for 2D tensors in CPU and CUDA implementations
|
2025-09-03 11:29:14 -04:00 |
bssrdf
|
8a589317b6
|
Add implicit GEMM convolution operation for 2D tensors in CUDA
|
2025-09-02 22:47:41 -04:00 |