Commit Graph

29 Commits

Author SHA1 Message Date
Reese Levine a5da437098 Merge remote-tracking branch 'upstream/master' 2025-09-11 17:13:02 -07:00
Reese Levine ff412050d8 Fix compilation 2025-09-11 17:05:27 -07:00
Reese Levine 4293531787 Refactor use of wg size entry 2025-09-11 16:56:38 -07:00
Reese Levine dc7bc4a25a Add get_rows implementation 2025-09-10 18:24:29 -07:00
Reese Levine 7fbe84cd5f Implement rms_norm 2025-09-09 17:10:23 -07:00
Jeff Bolz e68aa10d8f
vulkan: sort graph to allow more parallel execution (#15850)
* vulkan: sort graph to allow more parallel execution

Add a backend proc to allow the backend to modify the graph. The
vulkan implementation looks at which nodes depend on each other
and greedily reorders them to group together nodes that don't
depend on each other. It only reorders the nodes, doesn't change
the contents of any of them.

With #15489, this reduces the number of synchronizations needed.

* call optimize_graph per-split
2025-09-09 02:10:07 +08:00
Reese Levine c10219705d Get addition and multiplication working 2025-09-08 10:15:21 -07:00
Daniel Bevenius 3b15924d71
ggml WebGPU: remove userdata from request adapter callback (#15527)
* ggml WebGPU: remove userdata from request adapter callback

This commit removes the `userdata` parameter from the WebGPU request
adapter callback in `ggml-webgpu.cpp`. Instead, the lambda function
captures the `webgpu_context` directly.

The motivation for this change is to simplify the code and improve
readability.

* inline the callback lambda into the RequestAdapter call

This commit removes the callback lambda variable and inlines it directly
into the RequestAdapter call.
2025-09-07 11:19:45 +03:00
Reese Levine 7f9ee10e75 Add templated addition, clean up code 2025-09-04 14:12:44 -07:00
Reese Levine 1b16a91183 Merge remote-tracking branch 'origin/master' into addition 2025-09-04 12:40:34 -07:00
Daniel Bevenius 77dee9de97
ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops (#15695)
* ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops

This commit adds support for the TRANSPOSE and RESHAPE operations in the
ggml webgpu backend.

Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-01 14:28:49 +02:00
Reese Levine 45363632cb
ggml WebGPU: add support for quantization types (#15440)
* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

* Work on templating for different types in shaders

* Work on shader type generation

* Working q4_0 mul_mat and some templating for different types

* Add q4_0_f16 matmul and fix device init

* Add matmul support for basic quantization types

* Add q2_k and q3_k quantization

* Add rest of k-quants

* Get firt i-quant working

* Closer to supporting all i-quants

* Support rest of i-quants

* Cleanup code

* Fix python formatting

* debug

* Bugfix for memset

* Add padding to end of buffers on creation

* Simplify bit-shifting

* Update usage of StringView
2025-08-22 11:28:03 -07:00
Reese Levine 5fd160bbd9
ggml: Add basic SET_ROWS support in WebGPU (#15137)
* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments
2025-08-06 15:14:40 -07:00
Neha Abbas ac522434c7 most recent merge 2025-08-06 13:37:52 -07:00
Reese Levine 4ad0986123 Remove extra comments 2025-08-06 12:25:55 -07:00
Reese Levine 248f7a512f Add error buffers for reporting unsupported SET_ROWS indices 2025-08-06 10:15:50 -07:00
Reese Levine b2dbfcdcb1 Work on set rows 2025-08-05 16:43:21 -07:00
Reese Levine 6a6135cc85 Begin work on set_rows 2025-08-05 16:43:05 -07:00
Reese Levine 9515c6131a
ggml: WebGPU disable SET_ROWS for now (#15078)
* Add paramater buffer pool, batching of submissions, refactor command building/submission

* Add header for linux builds

* Free staged parameter buffers at once

* Format with clang-format

* Fix thread-safe implementation

* Use device implicit synchronization

* Update workflow to use custom release

* Remove testing branch workflow

* Disable set_rows until it's implemented

* Fix potential issue around empty queue submission

* Try synchronous submission

* Try waiting on all futures explicitly

* Add debug

* Add more debug messages

* Work on getting ssh access for debugging

* Debug on failure

* Disable other tests

* Remove extra if

* Try more locking

* maybe passes?

* test

* Some cleanups

* Restore build file

* Remove extra testing branch ci
2025-08-05 16:26:38 -07:00
Neha Abbas 39aa11d9a4 f32 add all tests passing 2025-08-04 15:52:07 -05:00
Reese Levine 587d0118f5
ggml: WebGPU backend host improvements and style fixing (#14978)
* Add parameter buffer pool, batching of submissions, refactor command building/submission

* Add header for linux builds

* Free staged parameter buffers at once

* Format with clang-format

* Fix thread-safe implementation

* Use device implicit synchronization

* Update workflow to use custom release

* Remove testing branch workflow
2025-08-04 08:52:43 -07:00
Neha Abbas 96d107e505 some f32 tests passing 2025-08-01 14:35:20 -05:00
Reese Levine cddda7e730 Use device implicit synchronization 2025-07-31 12:28:29 -07:00
Reese Levine b8012ecc0a Fix thread-safe implementation 2025-07-31 11:02:08 -07:00
Reese Levine bfff27f130 Format with clang-format 2025-07-30 15:06:09 -07:00
Reese Levine 01c8ced232 Free staged parameter buffers at once 2025-07-30 14:27:29 -07:00
Reese Levine 04d7b272d6 Add header for linux builds 2025-07-30 13:45:58 -07:00
Reese Levine 30ba139e5b Add paramater buffer pool, batching of submissions, refactor command building/submission 2025-07-30 12:33:06 -07:00
Reese Levine 21c021745d
ggml: Add initial WebGPU backend (#14521)
* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults

* Initialize webgpu device

* Making progress on setting up the backend

* Finish more boilerplate/utility functions

* Organize file and work on alloc buffer

* Add webgpu_context to prepare for actually running some shaders

* Work on memset and add shader loading

* Work on memset polyfill

* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it

* Implement get_tensor and buffer_clear

* Finish rest of setup

* Start work on compute graph

* Basic mat mul working

* Work on emscripten build

* Basic WebGPU backend instructions

* Use EMSCRIPTEN flag

* Work on passing ci, implement 4d tensor multiplication

* Pass thread safety test

* Implement permuting for mul_mat and cpy

* minor cleanups

* Address feedback

* Remove division by type size in cpy op

* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends

* Fix name

* Fix macos dawn prefix path
2025-07-16 18:18:51 +03:00