llama.cpp

Commit Graph

Author	SHA1	Message	Date
Reese Levine	a5da437098	Merge remote-tracking branch 'upstream/master'	2025-09-11 17:13:02 -07:00
Reese Levine	ff412050d8	Fix compilation	2025-09-11 17:05:27 -07:00
Reese Levine	4293531787	Refactor use of wg size entry	2025-09-11 16:56:38 -07:00
Reese Levine	dc7bc4a25a	Add get_rows implementation	2025-09-10 18:24:29 -07:00
Reese Levine	7fbe84cd5f	Implement rms_norm	2025-09-09 17:10:23 -07:00
Jeff Bolz	e68aa10d8f	vulkan: sort graph to allow more parallel execution (#15850 ) * vulkan: sort graph to allow more parallel execution Add a backend proc to allow the backend to modify the graph. The vulkan implementation looks at which nodes depend on each other and greedily reorders them to group together nodes that don't depend on each other. It only reorders the nodes, doesn't change the contents of any of them. With #15489, this reduces the number of synchronizations needed. * call optimize_graph per-split	2025-09-09 02:10:07 +08:00
Reese Levine	c10219705d	Get addition and multiplication working	2025-09-08 10:15:21 -07:00
Daniel Bevenius	3b15924d71	ggml WebGPU: remove userdata from request adapter callback (#15527 ) * ggml WebGPU: remove userdata from request adapter callback This commit removes the `userdata` parameter from the WebGPU request adapter callback in `ggml-webgpu.cpp`. Instead, the lambda function captures the `webgpu_context` directly. The motivation for this change is to simplify the code and improve readability. * inline the callback lambda into the RequestAdapter call This commit removes the callback lambda variable and inlines it directly into the RequestAdapter call.	2025-09-07 11:19:45 +03:00
Reese Levine	7f9ee10e75	Add templated addition, clean up code	2025-09-04 14:12:44 -07:00
Reese Levine	1b16a91183	Merge remote-tracking branch 'origin/master' into addition	2025-09-04 12:40:34 -07:00
Daniel Bevenius	77dee9de97	ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops (#15695 ) * ggml : WebGPU add TRANSPOSE and RESHAPE to supported ops This commit adds support for the TRANSPOSE and RESHAPE operations in the ggml webgpu backend. Co-authored-by: Diego Devesa <slarengh@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-09-01 14:28:49 +02:00
Reese Levine	45363632cb	ggml WebGPU: add support for quantization types (#15440 ) * Begin work on set_rows * Work on set rows * Add error buffers for reporting unsupported SET_ROWS indices * Remove extra comments * Work on templating for different types in shaders * Work on shader type generation * Working q4_0 mul_mat and some templating for different types * Add q4_0_f16 matmul and fix device init * Add matmul support for basic quantization types * Add q2_k and q3_k quantization * Add rest of k-quants * Get firt i-quant working * Closer to supporting all i-quants * Support rest of i-quants * Cleanup code * Fix python formatting * debug * Bugfix for memset * Add padding to end of buffers on creation * Simplify bit-shifting * Update usage of StringView	2025-08-22 11:28:03 -07:00
Reese Levine	5fd160bbd9	ggml: Add basic SET_ROWS support in WebGPU (#15137 ) * Begin work on set_rows * Work on set rows * Add error buffers for reporting unsupported SET_ROWS indices * Remove extra comments	2025-08-06 15:14:40 -07:00
Neha Abbas	ac522434c7	most recent merge	2025-08-06 13:37:52 -07:00
Reese Levine	4ad0986123	Remove extra comments	2025-08-06 12:25:55 -07:00
Reese Levine	248f7a512f	Add error buffers for reporting unsupported SET_ROWS indices	2025-08-06 10:15:50 -07:00
Reese Levine	b2dbfcdcb1	Work on set rows	2025-08-05 16:43:21 -07:00
Reese Levine	6a6135cc85	Begin work on set_rows	2025-08-05 16:43:05 -07:00
Reese Levine	9515c6131a	ggml: WebGPU disable SET_ROWS for now (#15078 ) * Add paramater buffer pool, batching of submissions, refactor command building/submission * Add header for linux builds * Free staged parameter buffers at once * Format with clang-format * Fix thread-safe implementation * Use device implicit synchronization * Update workflow to use custom release * Remove testing branch workflow * Disable set_rows until it's implemented * Fix potential issue around empty queue submission * Try synchronous submission * Try waiting on all futures explicitly * Add debug * Add more debug messages * Work on getting ssh access for debugging * Debug on failure * Disable other tests * Remove extra if * Try more locking * maybe passes? * test * Some cleanups * Restore build file * Remove extra testing branch ci	2025-08-05 16:26:38 -07:00
Neha Abbas	39aa11d9a4	f32 add all tests passing	2025-08-04 15:52:07 -05:00
Reese Levine	587d0118f5	ggml: WebGPU backend host improvements and style fixing (#14978 ) * Add parameter buffer pool, batching of submissions, refactor command building/submission * Add header for linux builds * Free staged parameter buffers at once * Format with clang-format * Fix thread-safe implementation * Use device implicit synchronization * Update workflow to use custom release * Remove testing branch workflow	2025-08-04 08:52:43 -07:00
Neha Abbas	96d107e505	some f32 tests passing	2025-08-01 14:35:20 -05:00
Reese Levine	cddda7e730	Use device implicit synchronization	2025-07-31 12:28:29 -07:00
Reese Levine	b8012ecc0a	Fix thread-safe implementation	2025-07-31 11:02:08 -07:00
Reese Levine	bfff27f130	Format with clang-format	2025-07-30 15:06:09 -07:00
Reese Levine	01c8ced232	Free staged parameter buffers at once	2025-07-30 14:27:29 -07:00
Reese Levine	04d7b272d6	Add header for linux builds	2025-07-30 13:45:58 -07:00
Reese Levine	30ba139e5b	Add paramater buffer pool, batching of submissions, refactor command building/submission	2025-07-30 12:33:06 -07:00
Reese Levine	21c021745d	ggml: Add initial WebGPU backend (#14521 ) * Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults * Initialize webgpu device * Making progress on setting up the backend * Finish more boilerplate/utility functions * Organize file and work on alloc buffer * Add webgpu_context to prepare for actually running some shaders * Work on memset and add shader loading * Work on memset polyfill * Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it * Implement get_tensor and buffer_clear * Finish rest of setup * Start work on compute graph * Basic mat mul working * Work on emscripten build * Basic WebGPU backend instructions * Use EMSCRIPTEN flag * Work on passing ci, implement 4d tensor multiplication * Pass thread safety test * Implement permuting for mul_mat and cpy * minor cleanups * Address feedback * Remove division by type size in cpy op * Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends * Fix name * Fix macos dawn prefix path	2025-07-16 18:18:51 +03:00

29 Commits