llama.cpp

History

Jeff Bolz 24e86cae72 vulkan: KHR_coopmat flash attention (#13506 ) This shader uses coopmat1 to do the QK^T multiply. The PV multiply is more difficult for various reasons so I haven't done it. Performance for this shader is around 2.5x better than for the scalar shader when doing prompt processing. Some of the benefit may be from other optimizations like staging through shared memory, or splitting by rows.		2025-05-14 11:55:26 +02:00
..
ggml-blas	ggml : add support for dynamic loading of backends (#10469 )	2024-11-25 15:13:39 +01:00
ggml-cann	CANN: Add support for async operator submission (#12864 )	2025-04-17 20:34:16 +08:00
ggml-cpu	ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509 )	2025-05-13 18:02:28 +03:00
ggml-cuda	llama/ggml: add LLM training support (#10544 )	2025-05-12 14:44:49 +02:00
ggml-hip	CUDA/HIP: Share the same unified memory allocation logic. (#12934 )	2025-04-15 11:20:38 +02:00
ggml-kompute	llama : add Qwen2VL support + multimodal RoPE (#10361 )	2024-12-14 14:43:46 +02:00
ggml-metal	metal : use FA-vec kernel up to batch size 20 (#13496 )	2025-05-13 18:04:39 +03:00
ggml-musa	cuda : enable CUDA Graph on CUDA Toolkit < 12.x (#12394 )	2025-03-17 20:25:13 +02:00
ggml-opencl	opencl: remove unnecessary assert for `add` (#13257 )	2025-05-12 13:13:49 -07:00
ggml-rpc	rpc : add rpc_msg_set_tensor_hash_req (#13353 )	2025-05-09 10:31:07 +03:00
ggml-sycl	enable dpcpp nightly builds with libraries (#13406 )	2025-05-12 13:15:32 +08:00
ggml-vulkan	vulkan: KHR_coopmat flash attention (#13506 )	2025-05-14 11:55:26 +02:00
CMakeLists.txt	cmake : removed stdc++fs (whisper/3097)	2025-05-07 17:28:36 +03:00
ggml-alloc.c	ggml: Don't assert fail when tensor data changes (#13222 )	2025-05-01 22:46:10 +02:00
ggml-backend-impl.h	ggml : upgrade init_tensor API to return a ggml_status (#11854 )	2025-02-28 14:41:47 +01:00
ggml-backend-reg.cpp	ggml-backend : fix backend search path (#12330 )	2025-03-11 14:25:17 +01:00
ggml-backend.cpp	llama/ggml: add LLM training support (#10544 )	2025-05-12 14:44:49 +02:00
ggml-common.h	musa: fix all warnings, re-enable `-DLLAMA_FATAL_WARNINGS=ON` in ci and update doc (#12611 )	2025-03-30 10:59:38 +02:00
ggml-impl.h	ggml: don't include arm_neon.h when using CUDA 12 with ARM Neon (ggml/1187)	2025-04-11 00:17:47 +03:00
ggml-opt.cpp	llama/ggml: add LLM training support (#10544 )	2025-05-12 14:44:49 +02:00
ggml-quants.c	whisper: remove MSVC warnings pragmas (whisper/3090)	2025-05-07 17:28:36 +03:00
ggml-quants.h	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-threading.cpp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-threading.h	remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )	2024-12-12 19:02:49 +01:00
ggml.c	llama/ggml: add LLM training support (#10544 )	2025-05-12 14:44:49 +02:00
gguf.cpp	Fix clang warning in gguf_check_reserved_keys (#12686 )	2025-04-01 13:12:53 +02:00