Commit Graph

5 Commits

Author SHA1 Message Date
Johannes Gäßler 84d9277fe2 split fattn compile via extern templates 2024-05-29 14:16:40 +02:00
Johannes Gäßler 462add6a01 try CI fix 2024-05-27 19:42:16 +02:00
Johannes Gäßler 672244a88b CUDA: quantized KV support for FA vec 2024-05-27 19:42:16 +02:00
Johannes Gäßler 133d99c599
CUDA: deduplicate FlashAttention code (#7352) 2024-05-18 12:36:25 +02:00
Johannes Gäßler dc685be466
CUDA: add FP32 FlashAttention vector kernel (#7188)
* CUDA: add FP32 FlashAttention vector kernel

* fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! CUDA: add FP32 FlashAttention vector kernel

* fixup! fixup! fixup! CUDA: add FP32 FlashAttention vector kernel
2024-05-12 19:40:45 +02:00