Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization 事件
PRODUCT_LAUNCH2026-06-05影响: MEDIUM
Macro: Enhancing Multilingual Counterfactual Explanations through Alignment-as-Preference Optimization arXiv:2605.11632v2 Announce Type: replace Abstract: Self-generated counterfactual explanations (SCEs) are minimally modified inputs (minimality) generated by large language models (LLMs) that flip their own predictions (validity), offering a causally grounded approach to unraveling black-box LLM behavior. Yet extending them beyond English remains challenging: existing methods struggle to produ
相关人物
暂无数据