A Framework For Privacy Diagnosis And Preservation In Data Publishing

A Framework For Privacy Diagnosis And Preservation In Data Publishing

Simple item page

dc.contributor.author	Mirakabad, Mohammad Reza Zare
dc.date.accessioned	2018-06-04T07:51:00Z
dc.date.available	2018-06-04T07:51:00Z
dc.date.issued	2010-04
dc.description.abstract	Privacy preservation in data publishing aims at the publication of data with protecting private information. Although removing direct identifier of individuals seems to protect their anonymity at first glance, private information may be revealed by joining the data to other external data. Privacy preservation addresses this privacy issue by introducing k-anonymity and l-diversity principles. Accordingly, privacy preservation techniques, namely k-anonymization and l-diversification algorithms, transform data (for example by generalization, suppression or fragmentation) to protect identity and sensitive information of individuals respectively. Most of the recent efforts addressing this issue have focused on privacy preservation techniques. However, not much effort has been made to address devising techniques, tools and methodologies to assist data publishers, managers and analysts in their investigation and evaluation of privacy risks. Hence, the idea of a privacy diagnosis centre is proposed that offers the necessary framework for diagnosing privacy risk and specifically k-anonymity and l-diversity. It is shown that this problem is a knowledge discovery problem that can be mapped to the framework proposed by Mannila and Toivonen. By introducing and proving the necessary monotonicity properties, necessary levelwise algorithms based on the apriori algorithm are presented and evaluated. Moreover, proposed models and techniques for privacy preservation still have some deficiencies and drawbacks. Specifically, clustering-based algorithms for kanonymization may result in high information loss. By showing the deficiencies of both small and big clusters, two-phase clustering k-anonymization is proposed. It allows clusters to become sufficiently big, and big clusters are split to smallest possible clusters in the next phase, both result in lower information loss. In addition, it is shown that the extension of k-anonymization algorithms for some l-diversity principles is not straightforward. It may result in high information loss or can not terminate. Accordingly, bucket clustering l-diversification is proposed to guarantee both termination and low information loss. The proposed algorithms are implemented and ran on two sample datasets, namely Adults and OCC, which have become de facto benchmarks for privacy preservation algorithms. Effectiveness and efficiency of the proposed framework and algorithms are proved experimentally by analyzing the results.	en_US
dc.identifier.uri	http://hdl.handle.net/123456789/5656
dc.language.iso	en	en_US
dc.publisher	Universiti Sains Malaysia	en_US
dc.subject	A framework for privacy diagnosis	en_US
dc.subject	and Preservation in data publishing	en_US
dc.title	A Framework For Privacy Diagnosis And Preservation In Data Publishing	en_US
dc.type	Thesis	en_US

Files

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Collections

Pusat Pengajian Sains Komputer - Tesis