llama.cpp

History

Zheyuan Chen bd90fc74c3 ggml-webgpu: improve flastAttention performance by software pipelining (#19151 ) * webgpu : pipeline flash_attn Q/K loads in WGSL * ggml-webgpu: unroll QK accumlation inner loop ggml-webgpu: vectorization * ggml-webgpu: unrolling * ggml-webgpu: remove redundant unrolling * ggml-webgpu: restore the config * ggml-webgpu: remove redundant comments * ggml-webgpu: formatting * ggml-webgpu: formatting and remove vectorization * ggml-webgpu: remove unnecessary constants * ggml-webgpu: change QKV buffer to read_write to pass validation * ggml-webgpu: add explanation for the additional bracket around Q K accumulate * Indentation and for -> if for tail * Kick off CI on wgsl only commits --------- Co-authored-by: Reese Levine <reeselevine1@gmail.com>		2026-01-29 14:05:30 -08:00
..
argmax.wgsl	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
argsort.wgsl	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
argsort_merge.wgsl	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
bin_op.tmpl.wgsl	ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187 )	2025-09-30 09:57:51 -07:00
binary_head.tmpl	GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018 )	2025-09-17 13:09:40 -07:00
common_decls.tmpl	GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018 )	2025-09-17 13:09:40 -07:00
cpy.tmpl.wgsl	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
cumsum.wgsl	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
embed_wgsl.py	ggml webgpu: unary op suppport, code refactoring, ops support (#17764 )	2025-12-05 12:25:51 -08:00
flash_attn.wgsl	ggml-webgpu: improve flastAttention performance by software pipelining (#19151 )	2026-01-29 14:05:30 -08:00
get_rows.tmpl.wgsl	ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187 )	2025-09-30 09:57:51 -07:00
glu.tmpl.wgsl	ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187 )	2025-09-30 09:57:51 -07:00
memset.wgsl	ggml WebGPU: add support for quantization types (#15440 )	2025-08-22 11:28:03 -07:00
mul_mat.tmpl.wgsl	ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031 )	2025-11-07 19:27:20 -08:00
mul_mat_decls.tmpl	ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031 )	2025-11-07 19:27:20 -08:00
mul_mat_reg_tile.tmpl.wgsl	ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031 )	2025-11-07 19:27:20 -08:00
mul_mat_subgroup_matrix.tmpl.wgsl	ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031 )	2025-11-07 19:27:20 -08:00
mul_mat_vec.tmpl.wgsl	ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031 )	2025-11-07 19:27:20 -08:00
pad.wgsl	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
rms_norm.wgsl	ggml webgpu: actually add softmax, fix rms_norm offset (#16400 )	2025-10-04 20:59:31 -07:00
rope.tmpl.wgsl	model: add support for qwen3vl series (#16780 )	2025-10-30 16:19:14 +01:00
scale.tmpl.wgsl	ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187 )	2025-09-30 09:57:51 -07:00
set_rows.wgsl	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
soft_max.tmpl.wgsl	ggml webgpu: actually add softmax, fix rms_norm offset (#16400 )	2025-10-04 20:59:31 -07:00
sum_rows.wgsl	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
unary.wgsl	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00