C-MORAL: Controllable Multi-Objective Molecular Optimization with Reinforcement Alignment for LLMs 文章

ArXiv CS.AI2026-05-28NEWSen作者: Rui Gao, Youngseung Jeon, Swastik Roy, Morteza Ziyadi, Xiang 'Anthony' Chen

摘要

arXiv:2604.23061v2 Announce Type: replace-cross Abstract: Large language models (LLMs) show promise for molecular optimization, but aligning them with selective and competing drug-design constraints remains challenging. We propose C-Moral, a reinforcement learning post-training framework for controllable multi-objective molecular optimization. C-Moral combines group-based relative optimization, property score alignment for heterogeneous objectives, and bottleneck-sensitive non-linear reward aggregation to improve stability across competing molecular properties. Experiments on C-MuMOInstruct and S$^2$-Bench MolOpt show that C-Moral achieves the best performance among compared methods on both benchmarks. On C-MuMOInstruct, C-Moral achieves the best Success Optimized Rate (SOR) of 48.9\% on in-domain tasks and 39.5\% on out-of-domain tasks while preserving scaffold similarity. On S$^2$-Bench MolOpt, it also achieves the strongest results across LogP, MR, and QED optimization tasks.