Commit Graph

30 Commits

Author SHA1 Message Date
bssrdf b015e4b7dc WIP: fixed bugs now results are correct 2025-11-14 11:10:34 -05:00
bssrdf 63c53fe1f1 WIP: move rs loop into block-k-loop following cutlass 2025-11-13 18:44:32 -05:00
bssrdf 0939511846 change mac loop to match cutlass 2025-11-13 15:45:43 -05:00
bssrdf 9f498d29f1 only enable m16n8k16 on ampere or above 2025-11-12 11:55:15 -05:00
bssrdf ea438d8b0e trying to reduce integer ops; simply code 2025-11-12 11:32:27 -05:00
bssrdf c33e4301dc m16n8k16 mma works; to be cleaned up 2025-11-12 10:26:01 -05:00
bssrdf a660d4d45d get rid of a convert unary kernel call and fuse the type cast into conv epilogue 2025-11-10 12:39:50 -05:00
bssrdf 1fdcb05dc8 increase maximum split factor to 16; use better heuristics to choose split-K factor, reducing tail effect 2025-11-10 11:47:56 -05:00
bssrdf 5ed2c1b787 reduce bank conflicts in filter transpose 2025-11-09 00:51:51 -05:00
bssrdf 8e0e944b70 reduced uncoalesced global access in filter transpose 2025-11-09 00:14:56 -05:00
bssrdf a2db92f41c make CI happy 2025-11-08 20:33:05 -05:00
bssrdf a3fb36fb71 make split-k condition check more robust 2025-11-08 18:47:12 -05:00
bssrdf a1fb3c1509 fixed a bug now split-k can choose a better split factor 2025-11-08 16:45:59 -05:00
bssrdf 9cbc099493 broken for some test cases 2025-11-08 14:51:45 -05:00
bssrdf 414bb8d9ed further reduce index swizzling computation cycles 2025-11-07 23:20:46 -05:00
bssrdf 8809af79a8 now bank conflicts free and performance get a bit boosted too 2025-11-07 22:11:21 -05:00
bssrdf 949eca4cba swizzling working, may still have room to optimize 2025-11-07 19:20:12 -05:00
bssrdf df88b2c917 trying to get rid of remaining bank conflicts; also fixed a bug for split-k condition check 2025-11-07 15:38:36 -05:00
bssrdf 4e9ebe92e0 minor update 2025-11-06 22:31:28 -05:00
bssrdf ba70ad8e59 added test cases exactly replicating sdxl unet steps 2025-11-06 20:35:37 -05:00
bssrdf 311213d209 make sure there are enough channels for split-k 2025-11-06 10:21:49 -05:00
bssrdf 09e3a5f07d try to reduce index calculation 2025-11-05 22:02:57 -05:00
bssrdf 688de6d7d8 fixed bug now split-k is working 2025-11-05 13:47:38 -05:00
bssrdf 6f44f47113 added split-k mode for skinny mnk shapes 2025-11-05 13:04:37 -05:00
bssrdf 275c08d25d add more sd like test cases 2025-11-04 15:16:31 -05:00
bssrdf 00a49c2fc1 another CI fix 2025-11-03 19:49:56 -05:00
bssrdf 8572313000 remove trailing blank 2025-11-03 19:45:22 -05:00
bssrdf 27881fbe7b fixes for CI 2025-11-03 19:43:55 -05:00
bssrdf fa9e415c9b minor update of test case 2025-11-03 15:48:57 -05:00
bssrdf 417cfc3cc6 added a test case to directly compare im2col and implicit gemm 2025-10-31 19:57:28 -04:00