llama.cpp/ggml
Akarshan Biswas b54cb2e3d0
sycl : add flash-attn support for head size 512 (#21654)
* sycl : add flash-attn support for head size 512

This patch extends the SYCL Flash Attention implementation to support head sizes (DKQ/DV) of 512.

Changes:
- Added DKQ/DV 512 cases to both tile and vector Flash Attention kernels.
- Updated kernel selection logic to allow vector kernels for head sizes up to 512 (previously 256).
- Removed unused/redundant AMD and RDNA-specific configuration functions in `fattn-tile.hpp`.
- Refactored `ggml_backend_sycl_buffer_init_tensor` to use a switch statement for clearer tensor extra buffer initialization.
- Added necessary template instances for the new 512 head size across various quantization types.

* remove defunct mxfp4 reorder from setting buffer type
2026-04-09 09:36:48 +03:00
..
cmake ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094) 2025-08-07 13:45:41 +02:00
include ggml : deprecate GGML_OP_ADD1 (#21363) 2026-04-07 15:28:27 +03:00
src sycl : add flash-attn support for head size 512 (#21654) 2026-04-09 09:36:48 +03:00
.gitignore
CMakeLists.txt ggml : bump version to 0.9.11 (ggml/1456) 2026-04-02 10:39:00 +03:00