Efficient Multidimensional Suppression for K-Anonymity

Many applications that employ data mining techniques involve mining data that include private and sensitive information about the subjects. One way to enable effective data mining while preserving privacy is to anonymize the data set that includes private information about subjects before being rele...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on knowledge and data engineering Vol. 22; no. 3; pp. 334 - 347
Main Authors:	Kisilevich, S., Rokach, L., Elovici, Y., Shapira, B.
Format:	Journal Article
Language:	English
Published:	New York, NY IEEE 01-03-2010 IEEE Computer Society The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Applied sciences Classification Classification tree analysis Computer science; control theory; systems Data mining Data privacy Data processing. List processing. Character string processing Decision trees deindentified data Delta modulation Diseases Exact sciences and technology Hierarchies Information retrieval. Graph Information systems. Data bases k-anonymity Memory and file management (including protection and security) Memory organisation. Data processing Multidimensional systems National security Preserving Privacy-preserving data mining Releasing Software Studies Taxonomy Theoretical computing Trees Private life Data privacy Data analysis Taxonomy Anonymity Information extraction Graph theory Data mining Decision tree Identifier Data integrity Semantics Classification decision trees Database Privacy-preserving data mining deindentified data k-anonymity
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Many applications that employ data mining techniques involve mining data that include private and sensitive information about the subjects. One way to enable effective data mining while preserving privacy is to anonymize the data set that includes private information about subjects before being released for data mining. One way to anonymize data set is to manipulate its content so that the records adhere to k-anonymity. Two common manipulation techniques used to achieve k-anonymity of a data set are generalization and suppression. Generalization refers to replacing a value with a less specific but semantically consistent value, while suppression refers to not releasing a value at all. Generalization is more commonly applied in this domain since suppression may dramatically reduce the quality of the data mining results if not properly used. However, generalization presents a major drawback as it requires a manually generated domain hierarchy taxonomy for every quasi-identifier in the data set on which k-anonymity has to be performed. In this paper, we propose a new method for achieving k-anonymity named K-anonymity of Classification Trees Using Suppression (kACTUS). In kACTUS, efficient multidimensional suppression is performed, i.e., values are suppressed only on certain records depending on other attribute values, without the need for manually produced domain hierarchy trees. Thus, in kACTUS, we identify attributes that have less influence on the classification of the data records and suppress them if needed in order to comply with k-anonymity. The kACTUS method was evaluated on 10 separate data sets to evaluate its accuracy as compared to other k-anonymity generalization- and suppression-based methods. Encouraging results suggest that kACTUS' predictive performance is better than that of existing k-anonymity algorithms. Specifically, on average, the accuracies of TDS, TDR, and kADET are lower than kACTUS in 3.5, 3.3, and 1.9 percent, respectively, despite their usage of manually defined domain trees. The accuracy gap is increased to 5.3, 4.3, and 3.1 percent, respectively, when no domain trees are used.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2009.91