llama.cpp/ggml/src
Jeff Bolz 24e86cae72
vulkan: KHR_coopmat flash attention (#13506)
This shader uses coopmat1 to do the Q*K^T multiply. The P*V multiply is more
difficult for various reasons so I haven't done it. Performance for this
shader is around 2.5x better than for the scalar shader when doing prompt
processing. Some of the benefit may be from other optimizations like staging
through shared memory, or splitting by rows.
2025-05-14 11:55:26 +02:00
..
ggml-blas ggml : add support for dynamic loading of backends (#10469) 2024-11-25 15:13:39 +01:00
ggml-cann CANN: Add support for async operator submission (#12864) 2025-04-17 20:34:16 +08:00
ggml-cpu ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509) 2025-05-13 18:02:28 +03:00
ggml-cuda llama/ggml: add LLM training support (#10544) 2025-05-12 14:44:49 +02:00
ggml-hip CUDA/HIP: Share the same unified memory allocation logic. (#12934) 2025-04-15 11:20:38 +02:00
ggml-kompute llama : add Qwen2VL support + multimodal RoPE (#10361) 2024-12-14 14:43:46 +02:00
ggml-metal metal : use FA-vec kernel up to batch size 20 (#13496) 2025-05-13 18:04:39 +03:00
ggml-musa cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394) 2025-03-17 20:25:13 +02:00
ggml-opencl opencl: remove unnecessary assert for `add` (#13257) 2025-05-12 13:13:49 -07:00
ggml-rpc rpc : add rpc_msg_set_tensor_hash_req (#13353) 2025-05-09 10:31:07 +03:00
ggml-sycl enable dpcpp nightly builds with libraries (#13406) 2025-05-12 13:15:32 +08:00
ggml-vulkan vulkan: KHR_coopmat flash attention (#13506) 2025-05-14 11:55:26 +02:00
CMakeLists.txt cmake : removed stdc++fs (whisper/3097) 2025-05-07 17:28:36 +03:00
ggml-alloc.c ggml: Don't assert fail when tensor data changes (#13222) 2025-05-01 22:46:10 +02:00
ggml-backend-impl.h ggml : upgrade init_tensor API to return a ggml_status (#11854) 2025-02-28 14:41:47 +01:00
ggml-backend-reg.cpp ggml-backend : fix backend search path (#12330) 2025-03-11 14:25:17 +01:00
ggml-backend.cpp llama/ggml: add LLM training support (#10544) 2025-05-12 14:44:49 +02:00
ggml-common.h musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (#12611) 2025-03-30 10:59:38 +02:00
ggml-impl.h ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187) 2025-04-11 00:17:47 +03:00
ggml-opt.cpp llama/ggml: add LLM training support (#10544) 2025-05-12 14:44:49 +02:00
ggml-quants.c whisper: remove MSVC warnings pragmas (whisper/3090) 2025-05-07 17:28:36 +03:00
ggml-quants.h ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.cpp ggml : build backends as libraries (#10256) 2024-11-14 18:04:35 +01:00
ggml-threading.h remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797) 2024-12-12 19:02:49 +01:00
ggml.c llama/ggml: add LLM training support (#10544) 2025-05-12 14:44:49 +02:00
gguf.cpp Fix clang warning in gguf_check_reserved_keys (#12686) 2025-04-01 13:12:53 +02:00