llama.cpp/scripts
Reese Levine c201d0de77
Wasm (#9)
* webgpu : fix build on emscripten

* more debugging stuff

* test-backend-ops: force single thread on wasm

* fix single-thread case for init_tensor_uniform

* use jspi

* add pthread

* test: remember to set n_thread for cpu backend

* Add buffer label and enable dawn-specific toggles to turn off some checks

* Intermediate state

* Fast working f16/f32 vec4

* Working float fast mul mat

* Clean up naming of mul_mat to match logical model, start work on q mul_mat

* Setup for subgroup matrix mat mul

* Basic working subgroup matrix

* Working subgroup matrix tiling

* Handle weirder sg matrix sizes (but still % sg matrix size)

* Working start to gemv

* working f16 accumulation with shared memory staging

* Print out available subgroup matrix configurations

* Vectorize dst stores for sg matrix shader

* Gemv working scalar

* Minor set_rows optimization (#4)

* updated optimization, fixed errors

* non vectorized version now dispatches one thread per element

* Simplify

* Change logic for set_rows pipelines

---------

Co-authored-by: Neha Abbas <nehaabbas@macbookpro.lan>
Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local>
Co-authored-by: Reese Levine <reeselevine1@gmail.com>

* Comment on dawn toggles

* Working subgroup matrix code for (semi)generic sizes

* Remove some comments

* Cleanup code

* Update dawn version and move to portable subgroup size

* Try to fix new dawn release

* Update subgroup size comment

* Only check for subgroup matrix configs if they are supported

* Add toggles for subgroup matrix/f16 support on nvidia+vulkan

* Make row/col naming consistent

* Refactor shared memory loading

* Move sg matrix stores to correct file

* Working q4_0

* Formatting

* Work with emscripten builds

* Fix test-backend-ops emscripten for f16/quantized types

* Use emscripten memory64 to support get_memory

* Add build flags and try ci

---------

Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-11-11 15:34:14 -08:00
..
apple scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
jinja scripts : add Jinja tester PySide6 simple app (#15756) 2025-09-05 01:05:12 +02:00
snapdragon Hexagon Op queue & dispatch optimizations (#16820) 2025-10-29 06:29:12 -07:00
bench-models.sh scripts : add script to bench models (#16894) 2025-11-02 00:15:31 +02:00
build-info.sh llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
check-requirements.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
compare-commits.sh scripts: add sqlite3 check for compare-commits.sh (#15633) 2025-08-28 19:23:22 +08:00
compare-llama-bench.py scripts: strip "AMD Instinct" from GPU name (#15668) 2025-08-29 22:04:08 +02:00
create_ops_docs.py Docs: add instructions for adding backends (#14889) 2025-07-27 09:36:43 +08:00
debug-test.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
fetch_server_test_models.py llama : move end-user examples to tools directory (#13249) 2025-05-02 20:27:13 +02:00
gen-authors.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
gen-unicode-data.py py : type-check all Python scripts with Pyright (#8341) 2024-07-07 15:04:39 -04:00
get-flags.mk build : pass all warning flags to nvcc via -Xcompiler (#5570) 2024-02-18 16:21:52 -05:00
get-hellaswag.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
get-pg.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
get-wikitext-2.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
get-wikitext-103.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
get-winogrande.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
get_chat_template.py scripts: corrected encoding when getting chat template (#11866) (#11907) 2025-02-18 10:30:16 +01:00
hf.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
install-oneapi.bat support SYCL backend windows build (#5208) 2024-01-31 08:08:07 +05:30
serve-static.js Wasm (#9) 2025-11-11 15:34:14 -08:00
server-bench.py llama: use FA + max. GPU layers by default (#15434) 2025-08-30 16:32:10 +02:00
sync-ggml-am.sh scripts : update sync scripts 2025-08-18 22:06:44 +03:00
sync-ggml.last sync : ggml 2025-11-05 10:41:51 +02:00
sync-ggml.sh scripts : update sync scripts 2025-08-18 22:06:44 +03:00
sync_vendor.py sync : vendor (#13901) 2025-05-30 16:25:45 +03:00
tool_bench.py server : speed up tests (#15836) 2025-09-06 14:45:24 +02:00
tool_bench.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
verify-checksum-models.py convert.py : add python logging instead of print() (#6511) 2024-05-03 22:36:41 +03:00
xxd.cmake llama : move end-user examples to tools directory (#13249) 2025-05-02 20:27:13 +02:00