Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned 论文

2019引用 1048
Natural Language Processing TechniquesTopic ModelingSoftware Engineering Research

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned · 相关技术