bssrdf
|
3f99818925
|
unroll some loops
|
2025-10-15 12:46:46 -04:00 |
bssrdf
|
b70cca2ea3
|
add support for both NCHW and NHWC layouts
|
2025-10-14 14:24:35 -04:00 |
bssrdf
|
3e2f722d11
|
fixed missing dilation
|
2025-10-14 11:12:55 -04:00 |
bssrdf
|
2237722056
|
added block variants; to be debugged
|
2025-10-14 11:02:10 -04:00 |
bssrdf
|
16b0f0ae3c
|
work in progress
|
2025-10-13 18:41:30 -04:00 |
bssrdf
|
0ca43582e8
|
reorder register tile loop
|
2025-10-08 13:52:56 -04:00 |
bssrdf
|
53a2ccbe12
|
minor update and add direct conv in benchmarking
|
2025-09-24 21:48:20 -04:00 |
bssrdf
|
83a3b7d6a9
|
Refactor conv2d_implicit_kernel for improved bitwise operations; add test for implicit convolution
|
2025-09-06 17:26:19 -04:00 |
bssrdf
|
4b0f9d571f
|
Refactor conv2d_implicit_kernel for improved readability and consistency; update parameter comments and remove unused code
|
2025-09-05 08:29:57 -04:00 |
bssrdf
|
5ffe97be9c
|
Fix boundary check in conv2d_implicit_kernel to include channel limits
|
2025-09-04 15:32:29 -04:00 |
bssrdf
|
6d84cbb5ab
|
Fix parameter order in conv2d_implicit and add comprehensive test cases for 2D convolution
|
2025-09-03 15:45:09 -04:00 |
bssrdf
|
3877608dc0
|
fix passing param as reference
|
2025-09-03 12:45:19 -04:00 |
bssrdf
|
4d772873b9
|
Add implicit convolution support for 2D tensors in CPU and CUDA implementations
|
2025-09-03 11:29:14 -04:00 |
bssrdf
|
8a589317b6
|
Add implicit GEMM convolution operation for 2D tensors in CUDA
|
2025-09-02 22:47:41 -04:00 |