llama.cpp

History

Reese Levine d006858316 ggml-webgpu: move from parameter buffer pool to single buffer with offsets (#21278 ) * Work towards removing bitcast * Move rest of existing types over * Add timeout back to wait and remove synchronous set_tensor/memset_tensor * move to unpackf16 for wider compatibility * cleanup * Remove deadlock condition in free_bufs * Start work on removing parameter buffer pools * Simplify and optimize further * simplify profile futures * Fix stride * Try using a single command buffer per batch * formatting		2026-04-03 11:40:14 -07:00
..
wgsl-shaders	ggml-webgpu: add vectorized flash attention (#20709 )	2026-04-02 10:40:42 -07:00
CMakeLists.txt	ggml webgpu: add support for emscripten builds (#17184 )	2025-12-03 10:25:34 +01:00
ggml-webgpu-shader-lib.hpp	ggml-webgpu: move from parameter buffer pool to single buffer with offsets (#21278 )	2026-04-03 11:40:14 -07:00
ggml-webgpu.cpp	ggml-webgpu: move from parameter buffer pool to single buffer with offsets (#21278 )	2026-04-03 11:40:14 -07:00
pre_wgsl.hpp	ggml webgpu: initial flashattention implementation (#18610 )	2026-01-08 08:23:39 -08:00