Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned 论文
2019引用 1048
Natural Language Processing TechniquesTopic ModelingSoftware Engineering Research
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned · 相关文章
暂无数据