CUDA: faster FlashAttention, kernel for bs == 1
This commit is contained in:
parent
08e69c5008
commit
75aa7b4b18
1419
ggml-cuda/fattn.cu
1419
ggml-cuda/fattn.cu
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue