llama.cpp

Author	SHA1	Message	Date
bssrdf	3ea524e9c4	WIP: almost working	2025-10-27 23:10:19 -04:00
bssrdf	a3784e17ad	WIP: debugging cpy transpose	2025-10-27 15:09:03 -04:00
bssrdf	30990788e8	WIP	2025-10-27 08:29:20 -04:00
bssrdf	c68fe36ae2	WIP: cleanup; enhanced test case	2025-10-25 21:57:39 -04:00
bssrdf	475f9879c5	WIP: fixed another bug	2025-10-25 20:24:14 -04:00
bssrdf	396f55831c	WIP: bug fix	2025-10-25 18:14:12 -04:00
bssrdf	610e41ae2d	still debugging	2025-10-25 11:10:39 -04:00
bssrdf	c45df12ee7	this case is broken; to be debugged	2025-10-24 22:40:34 -04:00
bssrdf	980ddc1e87	properly use __CUDA_ARCH__ to protect the tensor path	2025-10-24 21:56:58 -04:00
bssrdf	24b553204b	WIP: fixed another bug	2025-10-24 16:53:40 -04:00
bssrdf	6c90c20cb1	WIP: bug fix	2025-10-24 15:33:57 -04:00
bssrdf	be25be8ed3	WIP: debugging tensor core kernel	2025-10-24 14:24:26 -04:00
bssrdf	3f99818925	unroll some loops	2025-10-15 12:46:46 -04:00
bssrdf	b70cca2ea3	add support for both NCHW and NHWC layouts	2025-10-14 14:24:35 -04:00
bssrdf	2237722056	added block variants; to be debugged	2025-10-14 11:02:10 -04:00
bssrdf	c6255442bb	minor updates	2025-10-08 13:38:16 -04:00
bssrdf	53a2ccbe12	minor update and add direct conv in benchmarking	2025-09-24 21:48:20 -04:00
bssrdf	83a3b7d6a9	Refactor conv2d_implicit_kernel for improved bitwise operations; add test for implicit convolution	2025-09-06 17:26:19 -04:00