FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning 文章

ArXiv CS.AI2026-05-26NEWSen作者: Ankur Samanta, Rohan Gupta, Aditi Misra, Christian McIntosh Clarke, Jayakumar Rajadas

摘要

arXiv:2502.01184v2 Announce Type: replace-cross Abstract: Molecular representation learning methods typically tokenize molecules as individual atoms or use rigid, rule-based fragment decompositions, limiting their ability to capture meaningful chemical substructure context. We introduce FragmentNet, a graph-to-sequence model built around a novel adaptive, learned tokenizer that decomposes molecular graphs into chemically valid fragments of adjustable granularity, complemented by chemically aware spatial positional encodings that preserve molecular topology in the resulting sequence. Extending masked pre-training strategies from natural language processing to the molecular domain, we mask and reconstruct molecules at the level of chemically meaningful fragments rather than individual atoms.