llama.cpp/.github
Zheyuan Chen bd90fc74c3
ggml-webgpu: improve flastAttention performance by software pipelining (#19151)
* webgpu : pipeline flash_attn Q/K loads in WGSL

* ggml-webgpu: unroll Q*K accumlation inner loop

* ggml-webgpu: vectorization

* ggml-webgpu: unrolling

* ggml-webgpu: remove redundant unrolling

* ggml-webgpu: restore the config

* ggml-webgpu: remove redundant comments

* ggml-webgpu: formatting

* ggml-webgpu: formatting and remove vectorization

* ggml-webgpu: remove unnecessary constants

* ggml-webgpu: change QKV buffer to read_write to pass validation

* ggml-webgpu: add explanation for the additional bracket around Q K accumulate

* Indentation and for -> if for tail

* Kick off CI on wgsl only commits

---------

Co-authored-by: Reese Levine <reeselevine1@gmail.com>
2026-01-29 14:05:30 -08:00
..
ISSUE_TEMPLATE github: update issue templates [no ci] (#18410) 2025-12-28 10:50:56 +01:00
actions ci : remove libcurl in releases (#18775) 2026-01-12 21:43:02 +01:00
workflows ggml-webgpu: improve flastAttention performance by software pipelining (#19151) 2026-01-29 14:05:30 -08:00
labeler.yml ci : add label for jinja changes (#18903) 2026-01-17 21:52:02 +01:00
pull_request_template.md repo : update links to new url (#11886) 2025-02-15 16:40:57 +02:00