llama.cpp

History

Zheyuan Chen bd90fc74c3 ggml-webgpu: improve flastAttention performance by software pipelining (#19151 ) * webgpu : pipeline flash_attn Q/K loads in WGSL * ggml-webgpu: unroll QK accumlation inner loop ggml-webgpu: vectorization * ggml-webgpu: unrolling * ggml-webgpu: remove redundant unrolling * ggml-webgpu: restore the config * ggml-webgpu: remove redundant comments * ggml-webgpu: formatting * ggml-webgpu: formatting and remove vectorization * ggml-webgpu: remove unnecessary constants * ggml-webgpu: change QKV buffer to read_write to pass validation * ggml-webgpu: add explanation for the additional bracket around Q K accumulate * Indentation and for -> if for tail * Kick off CI on wgsl only commits --------- Co-authored-by: Reese Levine <reeselevine1@gmail.com>		2026-01-29 14:05:30 -08:00
..
wgsl-shaders	ggml-webgpu: improve flastAttention performance by software pipelining (#19151 )	2026-01-29 14:05:30 -08:00
CMakeLists.txt	ggml webgpu: add support for emscripten builds (#17184 )	2025-12-03 10:25:34 +01:00
ggml-webgpu-shader-lib.hpp	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
ggml-webgpu.cpp	ggml webgpu: Split shared state (webgpu_context) into global state and per-thread state (#18976 )	2026-01-27 20:53:36 -08:00
pre_wgsl.hpp	ggml webgpu: initial flashattention implementation (#18610 )	2026-01-08 08:23:39 -08:00