llama.cpp/ggml
Gaurav Garg 99c3df8219 Write an optimized flash_attn_stream_k_fixup kernel
Write a specialized and more optimized kernel for cases where nblocks_stream_k is multiple of ntiles_dst.
Make nblocks_stream_k to multiple of ntiles_dst if nblocks_stream_k > 2 * ntiles_dst
2026-03-30 00:32:26 +05:30
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include llama: fix llama-model-saver (#20503) 2026-03-25 12:53:16 +02:00
src Write an optimized flash_attn_stream_k_fixup kernel 2026-03-30 00:32:26 +05:30
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt ggml : bump version to 0.9.8 (ggml/1442) 2026-03-18 15:17:28 +02:00