A Framework For Privacy Diagnosis And Preservation In Data Publishing

dc.contributor.authorMirakabad, Mohammad Reza Zare
dc.date.accessioned2018-06-04T07:51:00Z
dc.date.available2018-06-04T07:51:00Z
dc.date.issued2010-04
dc.description.abstractPrivacy preservation in data publishing aims at the publication of data with protecting private information. Although removing direct identifier of individuals seems to protect their anonymity at first glance, private information may be revealed by joining the data to other external data. Privacy preservation addresses this privacy issue by introducing k-anonymity and l-diversity principles. Accordingly, privacy preservation techniques, namely k-anonymization and l-diversification algorithms, transform data (for example by generalization, suppression or fragmentation) to protect identity and sensitive information of individuals respectively. Most of the recent efforts addressing this issue have focused on privacy preservation techniques. However, not much effort has been made to address devising techniques, tools and methodologies to assist data publishers, managers and analysts in their investigation and evaluation of privacy risks. Hence, the idea of a privacy diagnosis centre is proposed that offers the necessary framework for diagnosing privacy risk and specifically k-anonymity and l-diversity. It is shown that this problem is a knowledge discovery problem that can be mapped to the framework proposed by Mannila and Toivonen. By introducing and proving the necessary monotonicity properties, necessary levelwise algorithms based on the apriori algorithm are presented and evaluated. Moreover, proposed models and techniques for privacy preservation still have some deficiencies and drawbacks. Specifically, clustering-based algorithms for kanonymization may result in high information loss. By showing the deficiencies of both small and big clusters, two-phase clustering k-anonymization is proposed. It allows clusters to become sufficiently big, and big clusters are split to smallest possible clusters in the next phase, both result in lower information loss. In addition, it is shown that the extension of k-anonymization algorithms for some l-diversity principles is not straightforward. It may result in high information loss or can not terminate. Accordingly, bucket clustering l-diversification is proposed to guarantee both termination and low information loss. The proposed algorithms are implemented and ran on two sample datasets, namely Adults and OCC, which have become de facto benchmarks for privacy preservation algorithms. Effectiveness and efficiency of the proposed framework and algorithms are proved experimentally by analyzing the results.en_US
dc.identifier.urihttp://hdl.handle.net/123456789/5656
dc.language.isoenen_US
dc.publisherUniversiti Sains Malaysiaen_US
dc.subjectA framework for privacy diagnosisen_US
dc.subjectand Preservation in data publishingen_US
dc.titleA Framework For Privacy Diagnosis And Preservation In Data Publishingen_US
dc.typeThesisen_US
Files
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: