With the development of the network, data collection and storage technology, the use and sharing of large amounts of data have become possible.Data mining, otherwise known as knowledge discovery, can extract “meaningful information” or “knowledge” from the large amounts of data so supports people’s decision-making.
However, traditional data mining techniques and algorithms directedly operated on the original dataset, which will cause the leakage of privacy data.Due to increasing concerns related to privacy, various privacy-preserving data mining techniques have been developed to address different privacy issues.
The main objective in privacy-preserving data mining is to develop algorithms for modifying the original data in some way so that the private data and private knowledge remain private even after the mining process.In privacy-preserving data mining (PPDM), data mining algorithms are analyzed for the side-effects they incur in data privacy.
Other approaches that employ cryptographic techniques to prevent information leakage are computationally very expensive.
Privacy-preserving data mining considers the problem of running data mining algorithms on confidential data that is not supposed to be revealed even to the party running the algorithm.The PPDM methods protect the data by changing them to mask or erase the original sensitive one to be concealed.
The main consideration of PPDM is two-fold:
First, sensitive raw data like identifiers, names, addresses and so on, should be modified or trimmed out from the original database, in order for the recipient of the data not to be able to compromise another person’s privacy. Second, sensitive knowledge which can be mined
Second, sensitive knowledge which can be mined from a database by using data mining algorithms should also be excluded, because such a knowledge can equally well compromise data privacy.
An important aspect in the development and assessment of algorithms and tools, for privacy-preserving data mining, is the identification of suitable evaluation criteria and the development of related benchmarks. An algorithm may perform better than another one on specific criteria, such as performance and/or data utility.
It is thus important to provide users with a set of metrics which will enable them to select the most appropriate privacy-preserving technique for the data at hand, with respect to some specific parameters they are interested in optimizing.