llama.cpp

Commit Graph

Author	SHA1	Message	Date
bssrdf	ecbbdb6608	reducing integer ops	2025-11-14 13:05:31 -05:00
bssrdf	0cb1ff419a	move some register to const memory space	2025-11-14 12:02:13 -05:00
bssrdf	b015e4b7dc	WIP: fixed bugs now results are correct	2025-11-14 11:10:34 -05:00
bssrdf	7d99222a61	WIP: debugging	2025-11-13 22:08:41 -05:00
bssrdf	0939511846	change mac loop to match cutlass	2025-11-13 15:45:43 -05:00
bssrdf	fac6f0adc3	add missing batch index bounds check	2025-11-10 20:05:39 -05:00
bssrdf	a1fb3c1509	fixed a bug now split-k can choose a better split factor	2025-11-08 16:45:59 -05:00
bssrdf	68ccd2a899	refactor cuda core code path	2025-11-06 09:54:01 -05:00
bssrdf	09e3a5f07d	try to reduce index calculation	2025-11-05 22:02:57 -05:00
bssrdf	688de6d7d8	fixed bug now split-k is working	2025-11-05 13:47:38 -05:00
bssrdf	6f44f47113	added split-k mode for skinny mnk shapes	2025-11-05 13:04:37 -05:00
bssrdf	c1f67c19e0	make CI happy	2025-10-29 23:23:21 -04:00
bssrdf	2b5351a898	make CI happy	2025-10-29 23:17:36 -04:00
bssrdf	2dfbbee73f	clean up	2025-10-29 13:19:35 -04:00
bssrdf	980ddc1e87	properly use __CUDA_ARCH__ to protect the tensor path	2025-10-24 21:56:58 -04:00
bssrdf	6c90c20cb1	WIP: bug fix	2025-10-24 15:33:57 -04:00
bssrdf	be25be8ed3	WIP: debugging tensor core kernel	2025-10-24 14:24:26 -04:00
bssrdf	80a996cfc0	WIP: tensore code compiled ok	2025-10-24 11:41:11 -04:00
bssrdf	66f6d16265	WIP	2025-10-23 13:52:26 -04:00
bssrdf	215ebf6526	WIP	2025-10-22 15:56:55 -04:00
bssrdf	f931ad883f	WIP	2025-10-21 17:12:50 -04:00
bssrdf	b70cca2ea3	add support for both NCHW and NHWC layouts	2025-10-14 14:24:35 -04:00
bssrdf	2237722056	added block variants; to be debugged	2025-10-14 11:02:10 -04:00
bssrdf	16b0f0ae3c	work in progress	2025-10-13 18:41:30 -04:00
bssrdf	8a589317b6	Add implicit GEMM convolution operation for 2D tensors in CUDA	2025-09-02 22:47:41 -04:00

25 Commits