Add FlashAttention v2 support for Attention, MultiHeadAttention and PackedMultiHeadAttention ops.Optimize BeamScore to improve BeamSearch performance.Improve LLM quantization accuracy with smoothquant.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |