To MRL or not to MRL: Text Embeddings are Robust to Truncation Without Matryoshka Learning, Except In Heavy Truncation Scenarios 文章

ArXiv CS.CL2026-05-29NEWSen作者: Sotaro Takeshita, Yurina Takeshita, Simone Paolo Ponzetto, Daniel Ruffinelli

摘要

arXiv:2605.16608v2 Announce Type: replace-cross Abstract: Matryoshka Representation Learning (MRL) is a widely adopted approach for training text encoders so they provide useful text representations at various sizes, available by simply truncating the resulting vectors at sizes pre-determined at training time. Recent works have shown that randomly truncating text embeddings has minimal impact in downstream performance unless vectors are reduced in size by at least 70%, suggesting that embeddings are already robust to truncation without the use of MRL. However, no prior work has compared random truncation to MRL, so it is unclear how the two methods compare as effective embedding reduction methods. In this paper, we study this by applying the same truncation used by MRL to models trained with and without MRL. Our results across several models and downstream tasks show that, unless heavily truncating embeddings (i.e.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据