The cost of privacy 论文

2008引用 322

Privacy-Preserving Technologies in DataCryptography and Data SecurityPrivacy, Security, and Data Protection

Cryptography and Data Security Privacy-Preserving Technologies in Data Privacy, Security, and Data Protection

作者

摘要

Re-identification is a major privacy threat to public datasets containing individual records. Many privacy protection algorithms rely on generalization and suppression of "quasi-identifier" attributes such as ZIP code and birthdate. Their objective is usually syntactic sanitization: for example, k-anonymity requires that each "quasi-identifier" tuple appear in at least k records, while l-diversity requires that the distribution of sensitive attributes for each quasi-identifier have high entropy. The utility of sanitized data is also measured syntactically, by the number of generalization steps applied or the number of records with the same quasi-identifier. In this paper, we ask whether generalization and suppression of quasi-identifiers offer any benefits over trivial sanitization which simply separates quasi-identifiers from sensitive attributes. Previous work showed that k-anonymous databases can be useful for data mining, but k-anonymization does not guarantee any privacy. By contrast, we measure the tradeoff between privacy (how much can the adversary learn from the sanitized records?) and utility, measured as accuracy of data-mining algorithms executed on the same sanitized records.

作者查看全部 (2)

Vitaly Shmatikov

Justin Brickell

The cost of privacy 论文

摘要

作者查看全部 (2)

相关技术

相关事件

相关文章