| .. |
|
add.cl
|
opencl: allow mixed f16/f32 `add` (#15140)
|
2025-08-12 02:42:41 -07:00 |
|
add_id.cl
|
opencl: add `swiglu_oai` and `add_id` (#15121)
|
2025-08-06 12:12:17 -07:00 |
|
argsort.cl
|
opencl: add new ops - `argsort`, `div`, `sub`, `addrows`, `sigmoid`, `group_norm` (#13787)
|
2025-05-27 12:56:08 -07:00 |
|
clamp.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
concat.cl
|
opencl: refactor some ops, concat, repeat, tanh and scale (#19226)
|
2026-02-02 15:54:43 -08:00 |
|
conv2d.cl
|
opencl: add conv2d kernel (#14403)
|
2025-07-21 10:03:19 -07:00 |
|
conv2d_f16_f32.cl
|
opencl: add conv2d kernel (#14403)
|
2025-07-21 10:03:19 -07:00 |
|
cpy.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
cvt.cl
|
opencl: add basic support for q4_1 (#19534)
|
2026-02-12 14:52:37 -08:00 |
|
diag_mask_inf.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
div.cl
|
opencl: add f16 for `add`, `sub`, `mul`, `div` (#14984)
|
2025-08-01 13:15:44 +02:00 |
|
embed_kernel.py
|
Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (#10693)
|
2024-12-13 12:23:52 -08:00 |
|
expm1.cl
|
opencl: add EXPM1 op (#18704)
|
2026-01-09 10:13:13 -08:00 |
|
fill.cl
|
opencl: add FILL op support (#18682)
|
2026-01-07 22:04:50 -08:00 |
|
flash_attn_f16.cl
|
opencl: add attn sinks support for FA kernels (#15706)
|
2025-09-01 23:26:53 -07:00 |
|
flash_attn_f32.cl
|
opencl: fix FA for f32 (#16584)
|
2025-10-15 10:48:28 -07:00 |
|
flash_attn_f32_f16.cl
|
opencl: add attn sinks support for FA kernels (#15706)
|
2025-09-01 23:26:53 -07:00 |
|
gelu.cl
|
opencl: add GELU_ERF (#14476)
|
2025-07-04 23:24:56 -07:00 |
|
gemm_moe_mxfp4_f32.cl
|
opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)
|
2025-10-17 17:55:32 -07:00 |
|
gemv_moe_mxfp4_f32.cl
|
opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)
|
2025-10-17 17:55:32 -07:00 |
|
gemv_noshuffle.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
gemv_noshuffle_general.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
gemv_noshuffle_general_q8_0_f32.cl
|
opencl: add optimized q8_0 mm kernel for adreno (#18871)
|
2026-01-30 10:19:27 -08:00 |
|
get_rows.cl
|
opencl: support ne3 in get_rows (#15866)
|
2025-09-30 09:55:13 -07:00 |
|
glu.cl
|
opencl: add `swiglu_oai` and `add_id` (#15121)
|
2025-08-06 12:12:17 -07:00 |
|
group_norm.cl
|
OpenCL: add fused group_norm/norm, mul, add (#15314)
|
2025-08-26 23:36:05 -07:00 |
|
im2col_f16.cl
|
opencl: fix `im2col` when `KW!=KH` (#14803)
|
2025-07-21 13:55:10 -07:00 |
|
im2col_f32.cl
|
opencl: fix `im2col` when `KW!=KH` (#14803)
|
2025-07-21 13:55:10 -07:00 |
|
mean.cl
|
opencl: add sqr, sqrt, mean and ssm_conv (#17476)
|
2025-11-26 13:29:58 -08:00 |
|
mul.cl
|
opencl: add f16 for `add`, `sub`, `mul`, `div` (#14984)
|
2025-08-01 13:15:44 +02:00 |
|
mul_mat_Ab_Bi_8x4.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mat_f16_f32.cl
|
opencl: add tiled mul_mat_f16_f32 (#14535)
|
2025-07-10 14:58:12 -07:00 |
|
mul_mm_f16_f32_kq_kqv.cl
|
opencl: add kernel to handle mat mul in attention to improve encoding speed (#17181)
|
2025-11-15 17:33:10 -08:00 |
|
mul_mm_f16_f32_l4_lm.cl
|
opencl: fix boundary handling for mul_mm (#16875)
|
2025-10-30 16:00:20 -07:00 |
|
mul_mm_f32_f32_l4_lm.cl
|
opencl: fix boundary handling for mul_mm (#16875)
|
2025-10-30 16:00:20 -07:00 |
|
mul_mm_q4_0_f32_l4_lm.cl
|
opencl: add basic support for q4_1 (#19534)
|
2026-02-12 14:52:37 -08:00 |
|
mul_mm_q4_1_f32_l4_lm.cl
|
opencl: add basic support for q4_1 (#19534)
|
2026-02-12 14:52:37 -08:00 |
|
mul_mm_q6_k_f32_l4_lm.cl
|
opencl: add general Q6_K mm and Q4_K mv (#19347)
|
2026-02-11 10:33:13 -08:00 |
|
mul_mm_q8_0_f32_8x4.cl
|
opencl: add optimized q8_0 mm kernel for adreno (#18871)
|
2026-01-30 10:19:27 -08:00 |
|
mul_mm_q8_0_f32_l4_lm.cl
|
opencl: fix boundary handling for mul_mm (#16875)
|
2025-10-30 16:00:20 -07:00 |
|
mul_mv_f16_f16.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mv_f16_f32.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mv_f16_f32_1row.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mv_f16_f32_l4.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mv_f32_f32.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mv_id_mxfp4_f32.cl
|
opencl: add initial mxfp4 support via mv (#15270)
|
2025-08-15 09:52:14 -07:00 |
|
mul_mv_id_mxfp4_f32_flat.cl
|
opencl: optimize mxfp4 kernels (#16037)
|
2025-09-18 12:03:34 -07:00 |
|
mul_mv_id_q4_0_f32_8x_flat.cl
|
opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003)
|
2025-06-10 16:55:58 -07:00 |
|
mul_mv_id_q8_0_f32.cl
|
opencl: initial `q8_0` mv support (#15732)
|
2025-09-21 14:48:44 -07:00 |
|
mul_mv_id_q8_0_f32_flat.cl
|
opencl: initial `q8_0` mv support (#15732)
|
2025-09-21 14:48:44 -07:00 |
|
mul_mv_mxfp4_f32.cl
|
opencl: add initial mxfp4 support via mv (#15270)
|
2025-08-15 09:52:14 -07:00 |
|
mul_mv_mxfp4_f32_flat.cl
|
opencl: optimize mxfp4 kernels (#16037)
|
2025-09-18 12:03:34 -07:00 |
|
mul_mv_q4_0_f32.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mv_q4_0_f32_1d_8x_flat.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mv_q4_0_f32_1d_16x_flat.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mv_q4_0_f32_8x_flat.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mv_q4_0_f32_v.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
mul_mv_q4_1_f32.cl
|
opencl: add basic support for q4_1 (#19534)
|
2026-02-12 14:52:37 -08:00 |
|
mul_mv_q4_1_f32_flat.cl
|
opencl: add basic support for q4_1 (#19534)
|
2026-02-12 14:52:37 -08:00 |
|
mul_mv_q4_k_f32.cl
|
opencl: add general Q6_K mm and Q4_K mv (#19347)
|
2026-02-11 10:33:13 -08:00 |
|
mul_mv_q6_k_f32.cl
|
opencl: add flattened q6_K mv (#19054)
|
2026-01-26 19:36:24 -08:00 |
|
mul_mv_q6_k_f32_flat.cl
|
opencl: add flattened q6_K mv (#19054)
|
2026-01-26 19:36:24 -08:00 |
|
mul_mv_q8_0_f32.cl
|
opencl: initial `q8_0` mv support (#15732)
|
2025-09-21 14:48:44 -07:00 |
|
mul_mv_q8_0_f32_flat.cl
|
opencl: initial `q8_0` mv support (#15732)
|
2025-09-21 14:48:44 -07:00 |
|
norm.cl
|
OpenCL: add fused group_norm/norm, mul, add (#15314)
|
2025-08-26 23:36:05 -07:00 |
|
pad.cl
|
opencl: support pad_ext (#15888)
|
2025-09-30 10:45:45 -07:00 |
|
relu.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
repeat.cl
|
opencl: refactor some ops, concat, repeat, tanh and scale (#19226)
|
2026-02-02 15:54:43 -08:00 |
|
rms_norm.cl
|
opencl: fix rms_norm_mul (#17250)
|
2025-11-15 17:40:14 -08:00 |
|
rope.cl
|
opencl: support imrope (#16914)
|
2025-11-03 11:47:57 -08:00 |
|
scale.cl
|
opencl: refactor some ops, concat, repeat, tanh and scale (#19226)
|
2026-02-02 15:54:43 -08:00 |
|
set_rows.cl
|
opencl: add fastdiv and use it in set_rows, ported from cuda (#17090)
|
2025-11-10 15:00:13 -08:00 |
|
sigmoid.cl
|
opencl: add new ops - `argsort`, `div`, `sub`, `addrows`, `sigmoid`, `group_norm` (#13787)
|
2025-05-27 12:56:08 -07:00 |
|
silu.cl
|
opencl: split `ggml-opencl.cl` into multiple files and cleanup (#12886)
|
2025-04-15 12:26:00 -07:00 |
|
softmax_4_f16.cl
|
opencl: support sink in `soft_max` (attn sinks) (#15152)
|
2025-08-07 21:47:03 -07:00 |
|
softmax_4_f32.cl
|
opencl: support sink in `soft_max` (attn sinks) (#15152)
|
2025-08-07 21:47:03 -07:00 |
|
softmax_f16.cl
|
opencl: support sink in `soft_max` (attn sinks) (#15152)
|
2025-08-07 21:47:03 -07:00 |
|
softmax_f32.cl
|
opencl: support sink in `soft_max` (attn sinks) (#15152)
|
2025-08-07 21:47:03 -07:00 |
|
softplus.cl
|
opencl: add SOFTPLUS op support (#18726)
|
2026-01-10 21:57:44 -08:00 |
|
solve_tri.cl
|
OpenCL: add SOLVE_TRI op support (#18846)
|
2026-01-15 11:17:17 -08:00 |
|
sqr.cl
|
opencl: add sqr, sqrt, mean and ssm_conv (#17476)
|
2025-11-26 13:29:58 -08:00 |
|
sqrt.cl
|
opencl: add sqr, sqrt, mean and ssm_conv (#17476)
|
2025-11-26 13:29:58 -08:00 |
|
ssm_conv.cl
|
opencl: add sqr, sqrt, mean and ssm_conv (#17476)
|
2025-11-26 13:29:58 -08:00 |
|
sub.cl
|
opencl: add f16 for `add`, `sub`, `mul`, `div` (#14984)
|
2025-08-01 13:15:44 +02:00 |
|
sum_rows.cl
|
opencl: add new ops - `argsort`, `div`, `sub`, `addrows`, `sigmoid`, `group_norm` (#13787)
|
2025-05-27 12:56:08 -07:00 |
|
tanh.cl
|
opencl: refactor some ops, concat, repeat, tanh and scale (#19226)
|
2026-02-02 15:54:43 -08:00 |
|
transpose.cl
|
opencl: unpack q4_0 for adreno in get_tensor (#18278)
|
2025-12-22 10:19:01 -08:00 |
|
tri.cl
|
opencl: add TRI op support (#18979)
|
2026-01-21 22:05:54 -08:00 |
|
tsembd.cl
|
ggml : fix padding in timestep embedding kernels (#15932)
|
2025-09-16 15:25:57 +02:00 |
|
upscale.cl
|
opencl : update upscale to support align corners (#14488)
|
2025-07-02 09:07:42 +02:00 |