llama.cpp/ggml/src/ggml-webgpu/wgsl-shaders
Zheyuan Chen bd90fc74c3
ggml-webgpu: improve flastAttention performance by software pipelining (#19151)
* webgpu : pipeline flash_attn Q/K loads in WGSL

* ggml-webgpu: unroll Q*K accumlation inner loop

* ggml-webgpu: vectorization

* ggml-webgpu: unrolling

* ggml-webgpu: remove redundant unrolling

* ggml-webgpu: restore the config

* ggml-webgpu: remove redundant comments

* ggml-webgpu: formatting

* ggml-webgpu: formatting and remove vectorization

* ggml-webgpu: remove unnecessary constants

* ggml-webgpu: change QKV buffer to read_write to pass validation

* ggml-webgpu: add explanation for the additional bracket around Q K accumulate

* Indentation and for -> if for tail

* Kick off CI on wgsl only commits

---------

Co-authored-by: Reese Levine <reeselevine1@gmail.com>
2026-01-29 14:05:30 -08:00
..
argmax.wgsl ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
argsort.wgsl ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
argsort_merge.wgsl ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
bin_op.tmpl.wgsl ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187) 2025-09-30 09:57:51 -07:00
binary_head.tmpl GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018) 2025-09-17 13:09:40 -07:00
common_decls.tmpl GGML WebGPU: Support for ADD, MUL, RMS_NORM, GET_ROWS operators (#16018) 2025-09-17 13:09:40 -07:00
cpy.tmpl.wgsl ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
cumsum.wgsl ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
embed_wgsl.py ggml webgpu: unary op suppport, code refactoring, ops support (#17764) 2025-12-05 12:25:51 -08:00
flash_attn.wgsl ggml-webgpu: improve flastAttention performance by software pipelining (#19151) 2026-01-29 14:05:30 -08:00
get_rows.tmpl.wgsl ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187) 2025-09-30 09:57:51 -07:00
glu.tmpl.wgsl ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187) 2025-09-30 09:57:51 -07:00
memset.wgsl ggml WebGPU: add support for quantization types (#15440) 2025-08-22 11:28:03 -07:00
mul_mat.tmpl.wgsl ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031) 2025-11-07 19:27:20 -08:00
mul_mat_decls.tmpl ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031) 2025-11-07 19:27:20 -08:00
mul_mat_reg_tile.tmpl.wgsl ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031) 2025-11-07 19:27:20 -08:00
mul_mat_subgroup_matrix.tmpl.wgsl ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031) 2025-11-07 19:27:20 -08:00
mul_mat_vec.tmpl.wgsl ggml webgpu: faster matrix multiplication/matrix-vector multiplication (#17031) 2025-11-07 19:27:20 -08:00
pad.wgsl ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
rms_norm.wgsl ggml webgpu: actually add softmax, fix rms_norm offset (#16400) 2025-10-04 20:59:31 -07:00
rope.tmpl.wgsl model: add support for qwen3vl series (#16780) 2025-10-30 16:19:14 +01:00
scale.tmpl.wgsl ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187) 2025-09-30 09:57:51 -07:00
set_rows.wgsl ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
soft_max.tmpl.wgsl ggml webgpu: actually add softmax, fix rms_norm offset (#16400) 2025-10-04 20:59:31 -07:00
sum_rows.wgsl ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
unary.wgsl ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00