GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding 事件

Name: GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding
Start: 2026-05-28

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding arXiv:2605.15250v2 Announce Type: replace-cross Abstract: Multi-head Latent Attention (MLA), the attention used in DeepSeek-V2/V3, jointly compresses keys and values into a low-rank latent and matches the H100 roofline almost perfectly. Its trained weights, however, expose only one decoding path - an absorbed MQA form - which ties efficient inference to H100-class compute-bandwidth ratios, forfeits tensor par

人工智能

关系图谱

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding 事件

相关公司查看全部 (7)

相关人物查看全部 (4)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)