idSCD: Identifying Training Datasets through Semantic Correlation Descriptors 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors arXiv:2605.30462v1 Announce Type: cross Abstract: Can a dataset be recognized from the spurious correlations it induces during training? We argue that datasets leave dataset-specific traces in a model's learned semantic correlation structure: incidental regularities that are predictive within a dataset, but not causal for the underlying task, can be internalized during training. We use this insight to study dataset-le

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors · 相关报道