llama.cpp

History

Zheyuan Chen bd90fc74c3 ggml-webgpu: improve flastAttention performance by software pipelining (#19151 ) * webgpu : pipeline flash_attn Q/K loads in WGSL * ggml-webgpu: unroll QK accumlation inner loop ggml-webgpu: vectorization * ggml-webgpu: unrolling * ggml-webgpu: remove redundant unrolling * ggml-webgpu: restore the config * ggml-webgpu: remove redundant comments * ggml-webgpu: formatting * ggml-webgpu: formatting and remove vectorization * ggml-webgpu: remove unnecessary constants * ggml-webgpu: change QKV buffer to read_write to pass validation * ggml-webgpu: add explanation for the additional bracket around Q K accumulate * Indentation and for -> if for tail * Kick off CI on wgsl only commits --------- Co-authored-by: Reese Levine <reeselevine1@gmail.com>		2026-01-29 14:05:30 -08:00
..
ISSUE_TEMPLATE	github: update issue templates [no ci] (#18410 )	2025-12-28 10:50:56 +01:00
actions	ci : remove libcurl in releases (#18775 )	2026-01-12 21:43:02 +01:00
workflows	ggml-webgpu: improve flastAttention performance by software pipelining (#19151 )	2026-01-29 14:05:30 -08:00
labeler.yml	ci : add label for jinja changes (#18903 )	2026-01-17 21:52:02 +01:00
pull_request_template.md	repo : update links to new url (#11886 )	2025-02-15 16:40:57 +02:00