| .. |
|
fattn-mma-f16-instance-ncols1_1-ncols2_8.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_1-ncols2_16.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_2-ncols2_4.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_2-ncols2_8.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_2-ncols2_16.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_4-ncols2_2.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_4-ncols2_4.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_4-ncols2_8.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_4-ncols2_16.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_8-ncols2_1.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_8-ncols2_2.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_8-ncols2_4.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_8-ncols2_8.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_16-ncols2_1.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_16-ncols2_2.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_16-ncols2_4.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_32-ncols2_1.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_32-ncols2_2.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-mma-f16-instance-ncols1_64-ncols2_1.cu
|
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
|
2025-05-09 13:34:58 +02:00 |
|
fattn-tile-instance-dkq40-dv40.cu
|
CUDA: faster tile FA, add oob checks, more HSs (#16492)
|
2025-10-11 20:54:32 +02:00 |
|
fattn-tile-instance-dkq64-dv64.cu
|
CUDA: faster tile FA, add oob checks, more HSs (#16492)
|
2025-10-11 20:54:32 +02:00 |
|
fattn-tile-instance-dkq72-dv72.cu
|
ggml: CUDA: add head size 72 for flash-attn (#16962)
|
2025-11-03 14:29:11 +01:00 |
|
fattn-tile-instance-dkq80-dv80.cu
|
CUDA: faster tile FA, add oob checks, more HSs (#16492)
|
2025-10-11 20:54:32 +02:00 |
|
fattn-tile-instance-dkq96-dv96.cu
|
CUDA: faster tile FA, add oob checks, more HSs (#16492)
|
2025-10-11 20:54:32 +02:00 |
|
fattn-tile-instance-dkq112-dv112.cu
|
CUDA: faster tile FA, add oob checks, more HSs (#16492)
|
2025-10-11 20:54:32 +02:00 |
|
fattn-tile-instance-dkq128-dv128.cu
|
CUDA: faster tile FA, add oob checks, more HSs (#16492)
|
2025-10-11 20:54:32 +02:00 |
|
fattn-tile-instance-dkq256-dv256.cu
|
CUDA: faster tile FA, add oob checks, more HSs (#16492)
|
2025-10-11 20:54:32 +02:00 |
|
fattn-tile-instance-dkq576-dv512.cu
|
CUDA: faster tile FA, add oob checks, more HSs (#16492)
|
2025-10-11 20:54:32 +02:00 |
|
fattn-vec-instance-f16-f16.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-f16-q4_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-f16-q4_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-f16-q5_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-f16-q5_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-f16-q8_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_0-f16.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_0-q4_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_0-q4_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_0-q5_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_0-q5_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_0-q8_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_1-f16.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_1-q4_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_1-q4_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_1-q5_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_1-q5_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q4_1-q8_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_0-f16.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_0-q4_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_0-q4_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_0-q5_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_0-q5_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_0-q8_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_1-f16.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_1-q4_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_1-q4_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_1-q5_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_1-q5_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q5_1-q8_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q8_0-f16.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q8_0-q4_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q8_0-q4_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q8_0-q5_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q8_0-q5_1.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
fattn-vec-instance-q8_0-q8_0.cu
|
CUDA: refactor and deduplicate vector FA kernels (#16208)
|
2025-09-27 18:45:07 +02:00 |
|
generate_cu_files.py
|
ggml: CUDA: add head size 72 for flash-attn (#16962)
|
2025-11-03 14:29:11 +01:00 |
|
mmf-instance-ncols_1.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_2.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_3.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_4.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_5.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_6.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_7.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_8.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_9.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_10.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_11.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_12.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_13.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_14.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_15.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmf-instance-ncols_16.cu
|
CUDA: Add mul_mat_id support for the mmf kernel (#15767)
|
2025-09-09 14:38:02 +08:00 |
|
mmq-instance-iq1_s.cu
|
CUDA: MMQ code deduplication + iquant support (#8495)
|
2024-07-20 22:25:26 +02:00 |
|
mmq-instance-iq2_s.cu
|
CUDA: MMQ code deduplication + iquant support (#8495)
|
2024-07-20 22:25:26 +02:00 |
|
mmq-instance-iq2_xs.cu
|
CUDA: MMQ code deduplication + iquant support (#8495)
|
2024-07-20 22:25:26 +02:00 |
|
mmq-instance-iq2_xxs.cu
|
CUDA: MMQ code deduplication + iquant support (#8495)
|
2024-07-20 22:25:26 +02:00 |
|
mmq-instance-iq3_s.cu
|
CUDA: MMQ code deduplication + iquant support (#8495)
|
2024-07-20 22:25:26 +02:00 |
|
mmq-instance-iq3_xxs.cu
|
CUDA: MMQ code deduplication + iquant support (#8495)
|
2024-07-20 22:25:26 +02:00 |
|
mmq-instance-iq4_nl.cu
|
CUDA: MMQ support for iq4_nl, iq4_xs (#8278)
|
2024-07-05 09:06:31 +02:00 |
|
mmq-instance-iq4_xs.cu
|
CUDA: MMQ support for iq4_nl, iq4_xs (#8278)
|
2024-07-05 09:06:31 +02:00 |
|
mmq-instance-mxfp4.cu
|
llama : add gpt-oss (#15091)
|
2025-08-05 22:10:36 +03:00 |
|
mmq-instance-q2_k.cu
|
llama : reorganize source code + improve CMake (#8006)
|
2024-06-26 18:33:02 +03:00 |
|
mmq-instance-q3_k.cu
|
llama : reorganize source code + improve CMake (#8006)
|
2024-06-26 18:33:02 +03:00 |
|
mmq-instance-q4_0.cu
|
llama : reorganize source code + improve CMake (#8006)
|
2024-06-26 18:33:02 +03:00 |
|
mmq-instance-q4_1.cu
|
llama : reorganize source code + improve CMake (#8006)
|
2024-06-26 18:33:02 +03:00 |
|
mmq-instance-q4_k.cu
|
llama : reorganize source code + improve CMake (#8006)
|
2024-06-26 18:33:02 +03:00 |
|
mmq-instance-q5_0.cu
|
llama : reorganize source code + improve CMake (#8006)
|
2024-06-26 18:33:02 +03:00 |
|
mmq-instance-q5_1.cu
|
llama : reorganize source code + improve CMake (#8006)
|
2024-06-26 18:33:02 +03:00 |
|
mmq-instance-q5_k.cu
|
llama : reorganize source code + improve CMake (#8006)
|
2024-06-26 18:33:02 +03:00 |
|
mmq-instance-q6_k.cu
|
llama : reorganize source code + improve CMake (#8006)
|
2024-06-26 18:33:02 +03:00 |
|
mmq-instance-q8_0.cu
|
llama : reorganize source code + improve CMake (#8006)
|
2024-06-26 18:33:02 +03:00 |