When Surface Form Changes Moderation Decisions: A Paired Study of Code-Mixed Workflow Instability 事件

PRODUCT_LAUNCH2026-06-06影响: MEDIUM

When Surface Form Changes Moderation Decisions: A Paired Study of Code-Mixed Workflow Instability arXiv:2606.05654v1 Announce Type: cross Abstract: Hate moderation is often evaluated as classification on clean English inputs, but deployed systems must route content to actions such as ALLOW, FLAG, or REVIEW. We study how this workflow changes under code-mixed inputs using a paired evaluation setting where the same underlying content is expressed as clean English and Tamil-English code-mix. Under