llama.cpp/ggml/src/ggml-webgpu
Zheyuan Chen bd90fc74c3
ggml-webgpu: improve flastAttention performance by software pipelining (#19151)
* webgpu : pipeline flash_attn Q/K loads in WGSL

* ggml-webgpu: unroll Q*K accumlation inner loop

* ggml-webgpu: vectorization

* ggml-webgpu: unrolling

* ggml-webgpu: remove redundant unrolling

* ggml-webgpu: restore the config

* ggml-webgpu: remove redundant comments

* ggml-webgpu: formatting

* ggml-webgpu: formatting and remove vectorization

* ggml-webgpu: remove unnecessary constants

* ggml-webgpu: change QKV buffer to read_write to pass validation

* ggml-webgpu: add explanation for the additional bracket around Q K accumulate

* Indentation and for -> if for tail

* Kick off CI on wgsl only commits

---------

Co-authored-by: Reese Levine <reeselevine1@gmail.com>
2026-01-29 14:05:30 -08:00
..
wgsl-shaders ggml-webgpu: improve flastAttention performance by software pipelining (#19151) 2026-01-29 14:05:30 -08:00
CMakeLists.txt ggml webgpu: add support for emscripten builds (#17184) 2025-12-03 10:25:34 +01:00
ggml-webgpu-shader-lib.hpp ggml webgpu: support for backend sampling (#18880) 2026-01-16 16:12:43 -08:00
ggml-webgpu.cpp ggml webgpu: Split shared state (webgpu_context) into global state and per-thread state (#18976) 2026-01-27 20:53:36 -08:00
pre_wgsl.hpp ggml webgpu: initial flashattention implementation (#18610) 2026-01-08 08:23:39 -08:00