llama.cpp

History

Akarshan Biswas b54cb2e3d0 sycl : add flash-attn support for head size 512 (#21654 ) * sycl : add flash-attn support for head size 512 This patch extends the SYCL Flash Attention implementation to support head sizes (DKQ/DV) of 512. Changes: - Added DKQ/DV 512 cases to both tile and vector Flash Attention kernels. - Updated kernel selection logic to allow vector kernels for head sizes up to 512 (previously 256). - Removed unused/redundant AMD and RDNA-specific configuration functions in `fattn-tile.hpp`. - Refactored `ggml_backend_sycl_buffer_init_tensor` to use a switch statement for clearer tensor extra buffer initialization. - Added necessary template instances for the new 512 head size across various quantization types. * remove defunct mxfp4 reorder from setting buffer type		2026-04-09 09:36:48 +03:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	ggml : deprecate GGML_OP_ADD1 (#21363 )	2026-04-07 15:28:27 +03:00
src	sycl : add flash-attn support for head size 512 (#21654 )	2026-04-09 09:36:48 +03:00
.gitignore	…
CMakeLists.txt	ggml : bump version to 0.9.11 (ggml/1456)	2026-04-02 10:39:00 +03:00