llama.cpp

Jeff Bolz 1fe00296f5 vulkan: fuse adds (#15252 ) * vulkan: fuse adds Fuse adds that have the same shape, which are common in MoE models. It will currently fuse up to 6 adds, because we assume no more than 8 descriptors per dispatch. But this could be changed. * check runtimeDescriptorArray feature * disable multi_add for Intel due to likely driver bug	2025-08-16 11:48:22 -05:00
..
CMakeLists.txt	vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO (#14427 )	2025-06-27 22:35:30 -05:00
acc.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
add.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
add_id.comp	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
argmax.comp	vulkan : fix out-of-bounds access in argmax kernel (#15342 )	2025-08-15 16:16:36 +02:00
argsort.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
clamp.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
concat.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
contig_copy.comp	vulkan: Add bfloat16 support (#12554 )	2025-05-01 20:49:39 +02:00
conv2d_dw.comp	sync : ggml (#13268 )	2025-05-02 20:54:30 +03:00
conv2d_mm.comp	vulkan: Use coopmat2 for conv2d (#14982 )	2025-08-03 14:23:57 +02:00
conv_transpose_1d.comp	ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813 )	2025-06-04 22:02:00 +02:00
copy.comp	vulkan: Add bfloat16 support (#12554 )	2025-05-01 20:49:39 +02:00
copy_from_quant.comp	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
copy_to_quant.comp	vulkan: add RTE variants for glu/add/sub/mul/div (#14653 )	2025-07-15 21:32:11 +02:00
cos.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
count_equal.comp	vulkan: implement several ops relevant for ggml_opt (#11769 )	2025-02-17 07:55:57 +01:00
dequant_f32.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
dequant_funcs.comp	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
dequant_funcs_cm2.comp	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
dequant_head.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
dequant_iq1_m.comp	vulkan: fix warnings (#13626 )	2025-05-20 21:35:16 +00:00
dequant_iq1_s.comp	vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528 )	2025-02-15 09:01:40 +01:00
dequant_iq2_s.comp	vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )	2025-01-29 18:29:39 +01:00
dequant_iq2_xs.comp	vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )	2025-01-29 18:29:39 +01:00
dequant_iq2_xxs.comp	vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )	2025-01-29 18:29:39 +01:00
dequant_iq3_s.comp	vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )	2025-01-29 18:29:39 +01:00
dequant_iq3_xxs.comp	vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )	2025-01-29 18:29:39 +01:00
dequant_iq4_nl.comp	vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )	2025-01-29 18:29:39 +01:00
dequant_iq4_xs.comp	vulkan: initial support for IQ4_XS quantization (#11501 )	2025-02-06 07:09:59 +01:00
dequant_mxfp4.comp	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
dequant_q2_k.comp	vulkan: fix noncontig check for mat_mul_id splitting (#14683 )	2025-07-15 21:51:09 +02:00
dequant_q3_k.comp	vulkan: fix noncontig check for mat_mul_id splitting (#14683 )	2025-07-15 21:51:09 +02:00
dequant_q4_0.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
dequant_q4_1.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
dequant_q4_k.comp	vulkan: fix noncontig check for mat_mul_id splitting (#14683 )	2025-07-15 21:51:09 +02:00
dequant_q5_0.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
dequant_q5_1.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
dequant_q5_k.comp	vulkan: fix noncontig check for mat_mul_id splitting (#14683 )	2025-07-15 21:51:09 +02:00
dequant_q6_k.comp	vulkan: fix noncontig check for mat_mul_id splitting (#14683 )	2025-07-15 21:51:09 +02:00
dequant_q8_0.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
diag_mask_inf.comp	vulkan: fix diag_mask_inf (#11323 )	2025-01-23 08:01:17 +01:00
div.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
flash_attn.comp	vulkan: support fattn sinks (#15126 )	2025-08-07 22:44:20 +02:00
flash_attn_base.comp	vulkan: support fattn sinks (#15126 )	2025-08-07 22:44:20 +02:00
flash_attn_cm1.comp	vulkan: Support mul_mat_id with f32 accumulators (#15337 )	2025-08-16 11:18:31 +02:00
flash_attn_cm2.comp	vulkan: support fattn sinks (#15126 )	2025-08-07 22:44:20 +02:00
flash_attn_split_k_reduce.comp	vulkan: support fattn sinks (#15126 )	2025-08-07 22:44:20 +02:00
geglu.comp	ggml : implement REGLU/GEGLU/SWIGLU ops (#14158 )	2025-06-29 11:04:10 +02:00
geglu_erf.comp	ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445 )	2025-07-03 23:07:22 +02:00
geglu_quick.comp	ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445 )	2025-07-03 23:07:22 +02:00
gelu.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
gelu_erf.comp	add GELU_ERF (#14455 )	2025-07-01 10:14:21 +02:00
gelu_quick.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
generic_binary_head.comp	vulkan: fuse adds (#15252 )	2025-08-16 11:48:22 -05:00
generic_head.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
generic_unary_head.comp	vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166 )	2025-01-16 22:47:10 +01:00
get_rows.comp	vulkan: Add bfloat16 support (#12554 )	2025-05-01 20:49:39 +02:00
get_rows_quant.comp	vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595 )	2025-02-28 09:42:52 +01:00
glu_head.comp	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
glu_main.comp	ggml : implement REGLU/GEGLU/SWIGLU ops (#14158 )	2025-06-29 11:04:10 +02:00
group_norm.comp	vulkan: fix group_norm (#10496 )	2024-11-26 16:45:05 +01:00
im2col.comp	vulkan/cuda: Fix im2col when KW!=KH (#14789 )	2025-07-21 13:35:40 +02:00
l2_norm.comp	llama: Add support for RWKV v7 architecture (#12412 )	2025-03-18 07:27:50 +08:00
leaky_relu.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
mul.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
mul_mat_split_k_reduce.comp	vulkan: optimize and reenable split_k (#10637 )	2024-12-03 20:29:54 +01:00
mul_mat_vec.comp	vulkan: Add bfloat16 support (#12554 )	2025-05-01 20:49:39 +02:00
mul_mat_vec_base.comp	vulkan: optimize mul_mat for small values of N (#10991 )	2024-12-30 18:27:11 +01:00
mul_mat_vec_iq1_m.comp	vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528 )	2025-02-15 09:01:40 +01:00
mul_mat_vec_iq1_s.comp	vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528 )	2025-02-15 09:01:40 +01:00
mul_mat_vec_iq2_s.comp	vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (#12472 )	2025-03-21 20:27:47 +01:00
mul_mat_vec_iq2_xs.comp	vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595 )	2025-02-28 09:42:52 +01:00
mul_mat_vec_iq2_xxs.comp	vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595 )	2025-02-28 09:42:52 +01:00
mul_mat_vec_iq3_s.comp	vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (#12472 )	2025-03-21 20:27:47 +01:00
mul_mat_vec_iq3_xxs.comp	vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595 )	2025-02-28 09:42:52 +01:00
mul_mat_vec_nc.comp	vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015 )	2025-08-02 10:48:30 +02:00
mul_mat_vec_p021.comp	vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505 )	2025-03-22 09:40:11 +01:00
mul_mat_vec_q2_k.comp	mat vec double buffer (#12188 )	2025-03-10 19:28:11 +00:00
mul_mat_vec_q3_k.comp	mat vec double buffer (#12188 )	2025-03-10 19:28:11 +00:00
mul_mat_vec_q4_k.comp	vulkan: scale caching for k quants + misc fixes (#11081 )	2025-01-15 19:50:13 +00:00
mul_mat_vec_q5_k.comp	vulkan: scale caching for k quants + misc fixes (#11081 )	2025-01-15 19:50:13 +00:00
mul_mat_vec_q6_k.comp	mat vec double buffer (#12188 )	2025-03-10 19:28:11 +00:00
mul_mm.comp	vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (#15334 )	2025-08-16 10:58:38 +02:00
mul_mm_cm2.comp	vulkan: optimizations for deepseek prompt processing (#14555 )	2025-07-12 11:51:58 +02:00
mul_mmq.comp	vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326 )	2025-05-09 09:23:41 +02:00
mul_mmq_funcs.comp	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
multi_add.comp	vulkan: fuse adds (#15252 )	2025-08-16 11:48:22 -05:00
norm.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
opt_step_adamw.comp	vulkan: implement several ops relevant for ggml_opt (#11769 )	2025-02-17 07:55:57 +01:00
opt_step_sgd.comp	finetune: SGD optimizer, more CLI args (#13873 )	2025-08-14 12:03:57 +02:00
pad.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
pool2d.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
quantize_q8_1.comp	Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135 )	2025-03-31 14:37:01 +02:00
reglu.comp	ggml : implement REGLU/GEGLU/SWIGLU ops (#14158 )	2025-06-29 11:04:10 +02:00
relu.comp	vulkan: Additional type support for unary, binary, and copy (#13266 )	2025-05-04 07:17:16 +02:00
repeat.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
repeat_back.comp	vulkan: implement several ops relevant for ggml_opt (#11769 )	2025-02-17 07:55:57 +01:00
rms_norm.comp	vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817 )	2025-07-22 17:35:21 +02:00
rms_norm_back.comp	vulkan: implement more backpropagation operators (#11914 )	2025-02-25 12:04:45 +01:00
roll.comp	vulkan : implement ggml_roll (ggml/1290)	2025-07-12 14:25:44 +03:00
rope_head.comp	vulkan: add RTE variants for glu/add/sub/mul/div (#14653 )	2025-07-15 21:32:11 +02:00
rope_multi.comp	vulkan : fix rope with partial rotation and non-cont src (#14582 )	2025-07-08 15:21:21 +02:00
rope_neox.comp	vulkan : fix rope with partial rotation and non-cont src (#14582 )	2025-07-08 15:21:21 +02:00
rope_norm.comp	vulkan : fix rope with partial rotation and non-cont src (#14582 )	2025-07-08 15:21:21 +02:00
rope_vision.comp	vulkan: support multi/vision rope, and noncontiguous rope (#11902 )	2025-02-16 08:52:23 +01:00
rte.comp	vulkan: add RTE variants for glu/add/sub/mul/div (#14653 )	2025-07-15 21:32:11 +02:00
scale.comp	ggml : add ggml_scale_bias (#14417 )	2025-07-09 18:16:12 +02:00
sigmoid.comp	vulkan: Additional type support for unary, binary, and copy (#13266 )	2025-05-04 07:17:16 +02:00
silu.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
silu_back.comp	vulkan: implement more backpropagation operators (#11914 )	2025-02-25 12:04:45 +01:00
sin.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
soft_max.comp	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
soft_max_back.comp	vulkan: implement more backpropagation operators (#11914 )	2025-02-25 12:04:45 +01:00
square.comp	vulkan: Use push constant offset to handle misaligned descriptors (#10987 )	2024-12-29 09:35:11 +01:00
sub.comp	vulkan: implement several ops relevant for ggml_opt (#11769 )	2025-02-17 07:55:57 +01:00
sum_rows.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
swiglu.comp	ggml : implement REGLU/GEGLU/SWIGLU ops (#14158 )	2025-06-29 11:04:10 +02:00
swiglu_oai.comp	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
tanh.comp	vulkan: Additional type support for unary, binary, and copy (#13266 )	2025-05-04 07:17:16 +02:00
test_bfloat16_support.comp	vulkan: Add bfloat16 support (#12554 )	2025-05-01 20:49:39 +02:00
test_coopmat2_support.comp	vulkan: compile a test shader in cmake to check for coopmat2 support (#10713 )	2024-12-08 09:05:55 +01:00
test_coopmat_support.comp	Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117 )	2025-01-08 09:18:13 +01:00
test_integer_dot_support.comp	Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135 )	2025-03-31 14:37:01 +02:00
timestep_embedding.comp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
types.comp	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
upscale.comp	vulkan : remove unused vars (#0 )	2025-07-12 14:25:44 +03:00
utils.comp	vulkan: fuse adds (#15252 )	2025-08-16 11:48:22 -05:00
vulkan-shaders-gen.cpp	vulkan: fuse adds (#15252 )	2025-08-16 11:48:22 -05:00
wkv6.comp	rwkv6: add wkv6 support for Vulkan backend (#10829 )	2024-12-16 22:00:46 +01:00
wkv7.comp	llama: Add support for RWKV v7 architecture (#12412 )	2025-03-18 07:27:50 +08:00

CMakeLists.txt

vulkan: Fix GGML_VULKAN_SHADER_DEBUG_INFO (#14427 )

2025-06-27 22:35:30 -05:00

acc.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

add.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

add_id.comp

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

argmax.comp

vulkan : fix out-of-bounds access in argmax kernel (#15342 )

2025-08-15 16:16:36 +02:00

argsort.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

clamp.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

concat.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

contig_copy.comp

vulkan: Add bfloat16 support (#12554 )

2025-05-01 20:49:39 +02:00

conv2d_dw.comp

sync : ggml (#13268 )

2025-05-02 20:54:30 +03:00

conv2d_mm.comp

vulkan: Use coopmat2 for conv2d (#14982 )

2025-08-03 14:23:57 +02:00

conv_transpose_1d.comp

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813 )

2025-06-04 22:02:00 +02:00

copy.comp

vulkan: Add bfloat16 support (#12554 )

2025-05-01 20:49:39 +02:00

copy_from_quant.comp

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

copy_to_quant.comp

vulkan: add RTE variants for glu/add/sub/mul/div (#14653 )

2025-07-15 21:32:11 +02:00

cos.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

count_equal.comp

vulkan: implement several ops relevant for ggml_opt (#11769 )

2025-02-17 07:55:57 +01:00

dequant_f32.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

dequant_funcs.comp

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

dequant_funcs_cm2.comp

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

dequant_head.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

dequant_iq1_m.comp

vulkan: fix warnings (#13626 )

2025-05-20 21:35:16 +00:00

dequant_iq1_s.comp

vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528 )

2025-02-15 09:01:40 +01:00

dequant_iq2_s.comp

vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )

2025-01-29 18:29:39 +01:00

dequant_iq2_xs.comp

vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )

2025-01-29 18:29:39 +01:00

dequant_iq2_xxs.comp

vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )

2025-01-29 18:29:39 +01:00

dequant_iq3_s.comp

vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )

2025-01-29 18:29:39 +01:00

dequant_iq3_xxs.comp

vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )

2025-01-29 18:29:39 +01:00

dequant_iq4_nl.comp

vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360 )

2025-01-29 18:29:39 +01:00

dequant_iq4_xs.comp

vulkan: initial support for IQ4_XS quantization (#11501 )

2025-02-06 07:09:59 +01:00

dequant_mxfp4.comp

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

dequant_q2_k.comp

vulkan: fix noncontig check for mat_mul_id splitting (#14683 )

2025-07-15 21:51:09 +02:00

dequant_q3_k.comp

vulkan: fix noncontig check for mat_mul_id splitting (#14683 )

2025-07-15 21:51:09 +02:00

dequant_q4_0.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

dequant_q4_1.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

dequant_q4_k.comp

vulkan: fix noncontig check for mat_mul_id splitting (#14683 )

2025-07-15 21:51:09 +02:00

dequant_q5_0.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

dequant_q5_1.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

dequant_q5_k.comp

vulkan: fix noncontig check for mat_mul_id splitting (#14683 )

2025-07-15 21:51:09 +02:00

dequant_q6_k.comp

vulkan: fix noncontig check for mat_mul_id splitting (#14683 )

2025-07-15 21:51:09 +02:00

dequant_q8_0.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

diag_mask_inf.comp

vulkan: fix diag_mask_inf (#11323 )

2025-01-23 08:01:17 +01:00

div.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

flash_attn.comp

vulkan: support fattn sinks (#15126 )

2025-08-07 22:44:20 +02:00

flash_attn_base.comp

vulkan: support fattn sinks (#15126 )

2025-08-07 22:44:20 +02:00

flash_attn_cm1.comp

vulkan: Support mul_mat_id with f32 accumulators (#15337 )

2025-08-16 11:18:31 +02:00

flash_attn_cm2.comp

vulkan: support fattn sinks (#15126 )

2025-08-07 22:44:20 +02:00

flash_attn_split_k_reduce.comp

vulkan: support fattn sinks (#15126 )

2025-08-07 22:44:20 +02:00

geglu.comp

ggml : implement REGLU/GEGLU/SWIGLU ops (#14158 )

2025-06-29 11:04:10 +02:00

geglu_erf.comp

ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445 )

2025-07-03 23:07:22 +02:00

geglu_quick.comp

ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445 )

2025-07-03 23:07:22 +02:00

gelu.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

gelu_erf.comp

add GELU_ERF (#14455 )

2025-07-01 10:14:21 +02:00

gelu_quick.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

generic_binary_head.comp

vulkan: fuse adds (#15252 )

2025-08-16 11:48:22 -05:00

generic_head.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

generic_unary_head.comp

vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11166 )

2025-01-16 22:47:10 +01:00

get_rows.comp

vulkan: Add bfloat16 support (#12554 )

2025-05-01 20:49:39 +02:00

get_rows_quant.comp

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595 )

2025-02-28 09:42:52 +01:00

glu_head.comp

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

glu_main.comp

ggml : implement REGLU/GEGLU/SWIGLU ops (#14158 )

2025-06-29 11:04:10 +02:00

group_norm.comp

vulkan: fix group_norm (#10496 )

2024-11-26 16:45:05 +01:00

im2col.comp

vulkan/cuda: Fix im2col when KW!=KH (#14789 )

2025-07-21 13:35:40 +02:00

l2_norm.comp

llama: Add support for RWKV v7 architecture (#12412 )

2025-03-18 07:27:50 +08:00

leaky_relu.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

mul.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

mul_mat_split_k_reduce.comp

vulkan: optimize and reenable split_k (#10637 )

2024-12-03 20:29:54 +01:00

mul_mat_vec.comp

vulkan: Add bfloat16 support (#12554 )

2025-05-01 20:49:39 +02:00

mul_mat_vec_base.comp

vulkan: optimize mul_mat for small values of N (#10991 )

2024-12-30 18:27:11 +01:00

mul_mat_vec_iq1_m.comp

vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528 )

2025-02-15 09:01:40 +01:00

mul_mat_vec_iq1_s.comp

vulkan: initial support for IQ1_S and IQ1_M quantizations (#11528 )

2025-02-15 09:01:40 +01:00

mul_mat_vec_iq2_s.comp

vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (#12472 )

2025-03-21 20:27:47 +01:00

mul_mat_vec_iq2_xs.comp

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595 )

2025-02-28 09:42:52 +01:00

mul_mat_vec_iq2_xxs.comp

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595 )

2025-02-28 09:42:52 +01:00

mul_mat_vec_iq3_s.comp

vulkan: workaround for AMD Windows driver 16 bit unpack8 bug (#12472 )

2025-03-21 20:27:47 +01:00

mul_mat_vec_iq3_xxs.comp

vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizations (#11595 )

2025-02-28 09:42:52 +01:00

mul_mat_vec_nc.comp

vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (#15015 )

2025-08-02 10:48:30 +02:00

mul_mat_vec_p021.comp

vulkan: Optimize mul_mat_vec p021 and nc shaders (#12505 )

2025-03-22 09:40:11 +01:00

mul_mat_vec_q2_k.comp

mat vec double buffer (#12188 )

2025-03-10 19:28:11 +00:00

mul_mat_vec_q3_k.comp

mat vec double buffer (#12188 )

2025-03-10 19:28:11 +00:00

mul_mat_vec_q4_k.comp

vulkan: scale caching for k quants + misc fixes (#11081 )

2025-01-15 19:50:13 +00:00

mul_mat_vec_q5_k.comp

vulkan: scale caching for k quants + misc fixes (#11081 )

2025-01-15 19:50:13 +00:00

mul_mat_vec_q6_k.comp

mat vec double buffer (#12188 )

2025-03-10 19:28:11 +00:00

mul_mm.comp

vulkan: Add missing bounds checking to scalar/coopmat1 mul_mat_id (#15334 )

2025-08-16 10:58:38 +02:00

mul_mm_cm2.comp

vulkan: optimizations for deepseek prompt processing (#14555 )

2025-07-12 11:51:58 +02:00

mul_mmq.comp

vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326 )

2025-05-09 09:23:41 +02:00

mul_mmq_funcs.comp

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

multi_add.comp

vulkan: fuse adds (#15252 )

2025-08-16 11:48:22 -05:00

norm.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

opt_step_adamw.comp

vulkan: implement several ops relevant for ggml_opt (#11769 )

2025-02-17 07:55:57 +01:00

opt_step_sgd.comp

finetune: SGD optimizer, more CLI args (#13873 )

2025-08-14 12:03:57 +02:00

pad.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

pool2d.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

quantize_q8_1.comp

Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135 )

2025-03-31 14:37:01 +02:00

reglu.comp

ggml : implement REGLU/GEGLU/SWIGLU ops (#14158 )

2025-06-29 11:04:10 +02:00

relu.comp

vulkan: Additional type support for unary, binary, and copy (#13266 )

2025-05-04 07:17:16 +02:00

repeat.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

repeat_back.comp

vulkan: implement several ops relevant for ggml_opt (#11769 )

2025-02-17 07:55:57 +01:00

rms_norm.comp

vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817 )

2025-07-22 17:35:21 +02:00

rms_norm_back.comp

vulkan: implement more backpropagation operators (#11914 )

2025-02-25 12:04:45 +01:00

roll.comp

vulkan : implement ggml_roll (ggml/1290)

2025-07-12 14:25:44 +03:00

rope_head.comp

vulkan: add RTE variants for glu/add/sub/mul/div (#14653 )

2025-07-15 21:32:11 +02:00

rope_multi.comp

vulkan : fix rope with partial rotation and non-cont src (#14582 )

2025-07-08 15:21:21 +02:00

rope_neox.comp

vulkan : fix rope with partial rotation and non-cont src (#14582 )

2025-07-08 15:21:21 +02:00

rope_norm.comp

vulkan : fix rope with partial rotation and non-cont src (#14582 )

2025-07-08 15:21:21 +02:00

rope_vision.comp

vulkan: support multi/vision rope, and noncontiguous rope (#11902 )

2025-02-16 08:52:23 +01:00

rte.comp

vulkan: add RTE variants for glu/add/sub/mul/div (#14653 )

2025-07-15 21:32:11 +02:00

scale.comp

ggml : add ggml_scale_bias (#14417 )

2025-07-09 18:16:12 +02:00

sigmoid.comp

vulkan: Additional type support for unary, binary, and copy (#13266 )

2025-05-04 07:17:16 +02:00

silu.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

silu_back.comp

vulkan: implement more backpropagation operators (#11914 )

2025-02-25 12:04:45 +01:00

sin.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

soft_max.comp

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

soft_max_back.comp

vulkan: implement more backpropagation operators (#11914 )

2025-02-25 12:04:45 +01:00

square.comp

vulkan: Use push constant offset to handle misaligned descriptors (#10987 )

2024-12-29 09:35:11 +01:00

sub.comp

vulkan: implement several ops relevant for ggml_opt (#11769 )

2025-02-17 07:55:57 +01:00

sum_rows.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

swiglu.comp

ggml : implement REGLU/GEGLU/SWIGLU ops (#14158 )

2025-06-29 11:04:10 +02:00

swiglu_oai.comp

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

tanh.comp

vulkan: Additional type support for unary, binary, and copy (#13266 )

2025-05-04 07:17:16 +02:00

test_bfloat16_support.comp

vulkan: Add bfloat16 support (#12554 )

2025-05-01 20:49:39 +02:00

test_coopmat2_support.comp

vulkan: compile a test shader in cmake to check for coopmat2 support (#10713 )

2024-12-08 09:05:55 +01:00

test_coopmat_support.comp

Disable GL_KHR_cooperative_matrix Vulkan extension if not available. (#11117 )

2025-01-08 09:18:13 +01:00

test_integer_dot_support.comp

Vulkan: Add DP4A MMQ and Q8_1 quantization shader (#12135 )

2025-03-31 14:37:01 +02:00

timestep_embedding.comp

ggml : build backends as libraries (#10256 )

2024-11-14 18:04:35 +01:00

types.comp

llama : add gpt-oss (#15091 )

2025-08-05 22:10:36 +03:00

upscale.comp

vulkan : remove unused vars (#0 )

2025-07-12 14:25:44 +03:00

utils.comp

vulkan: fuse adds (#15252 )

2025-08-16 11:48:22 -05:00

vulkan-shaders-gen.cpp

vulkan: fuse adds (#15252 )

2025-08-16 11:48:22 -05:00

wkv6.comp

rwkv6: add wkv6 support for Vulkan backend (#10829 )

2024-12-16 22:00:46 +01:00

wkv7.comp

llama: Add support for RWKV v7 architecture (#12412 )

2025-03-18 07:27:50 +08:00