Data quality from crowdsourcing 论文

2009引用 252

Mobile Crowdsensing and CrowdsourcingSpam and Phishing DetectionData Stream Mining Techniques

智能手机 Data Stream Mining Techniques Spam and Phishing Detection Mobile Crowdsensing and Crowdsourcing

作者

摘要

Annotation acquisition is an essential step in training supervised classifiers. However, manual annotation is often time-consuming and expensive. The possibility of recruiting annotators through Internet services (e.g., Amazon Mechanic Turk) is an appealing option that allows multiple labeling tasks to be outsourced in bulk, typically with low overall costs and fast completion rates. In this paper, we consider the difficult problem of classifying sentiment in political blog snippets. Annotation data from both expert annotators in a research lab and non-expert annotators recruited from the Internet are examined. Three selection criteria are identified to select high-quality annotations: noise level, sentiment ambiguity, and lexical uncertainty. Analysis confirm the utility of these criteria on improving data quality. We conduct an empirical study to examine the effect of noisy annotations on the performance of sentiment classification models, and evaluate the utility of annotation selection on classification accuracy and efficiency.

作者查看全部 (3)

Vikas Sindhwani

Prem Melville

Pei-Yun Hsueh

Data quality from crowdsourcing 论文

摘要

作者查看全部 (3)

相关技术查看全部 (1)

相关事件

相关文章