FlashAttention 1 CUDA Kernels and FlashAttention: Why Memory Bandwidth Is the Bottleneck May 18, 2026