EVALUASI EFEKTIVITAS TEKNIK PRIVACY-PRESERVING: K-ANONYMITY,  L-DIVERSITY, T-CLOSENESS PADA DATA SENSITIF: EVALUATION OF THE EFFECTIVENESS OF PRIVACY-PRESERVING TECHNIQUES—K-ANONYMITY, L-DIVERSITY, AND T-CLOSENESS—ON SENSITIVE DATA

Ronal; Desy Ebigael Tambunan; Yuliana

doi:10.52972/hoaq.vol17no1.p102-111

Authors

Ronal Institut Teknologi Sumatera
Desy Ebigael Tambunan STIMIK LIKMI
Yuliana Institut Teknologi Sumatera

DOI:

https://doi.org/10.52972/hoaq.vol17no1.p102-111

Keywords:

Data Privacy, Anonymization, k-Anonymity, l-Diversity, t-Closeness

Abstract

Perlindungan privasi menjadi aspek krusial dalam pengumpulan, pengolahan, dan publikasi data sensitif, namun potensi risiko kebocoran informasi dapat menimbulkan konsekuensi hukum maupun kerugian reputasi. Untuk menjaga keseimbangan antara kegunaan data dan privasi individu, teknik anonimisasi menjadi pendekatan utama, termasuk penerapan k-anonymity dan evaluasi menggunakan l-diversity dan t-closeness. Penelitian ini bertujuan untuk mengevaluasi efektivitas teknik-teknik tersebut dalam mengurangi risiko pengungkapan identitas dan atribut sensitif pada dataset kesehatan. Studi kasus menggunakan 55500 dataset medis dengan quasi-identifier Age, Gender, dan Blood Type, serta atribut sensitif Medical Condition. Dataset dianonimkan menggunakan k-anonymity melalui proses generalisasi dan supresi untuk membentuk equivalence class dengan ukuran minimum k ? 5. Selanjutnya, dataset dievaluasi menggunakan l-diversity untuk mengukur keberagaman atribut sensitif dalam setiap kelompok, serta t-closeness untuk menilai kesamaan distribusi atribut sensitif terhadap distribusi global menggunakan Earth Mover’s Distance (EMD). Hasil pengujian menunjukkan bahwa seluruh equivalence class telah memenuhi k ? 5 dengan suppression rate sebesar 1,15%. Evaluasi l-diversity menunjukkan tidak terdapat equivalence class dengan l < 2, sehingga risiko attribute disclosure dapat diminimalkan. Pengujian t-closeness menggunakan Earth Mover’s Distance (EMD) menunjukkan mayoritas kelas memiliki EMD ? 0,15 dan hanya satu kelas dengan nilai sedikit di atas ambang batas t = 0,2. Dari sisi utilitas data, nilai Normalized Generalized Information Loss (NGIL) sebesar 0,079 (7,9%) dan AECS sebesar 6,28 menunjukkan tingkat kehilangan informasi yang rendah tanpa terjadi over-generalization. Secara keseluruhan, kombinasi metode yang diterapkan berhasil mencapai keseimbangan antara perlindungan privasi dan data utility.

Privacy protection has become a crucial aspect in the collection, processing, and publication of sensitive data, as potential risks of information leakage may lead to legal consequences and reputational damage. To maintain a balance between data utility and individual privacy, anonymization techniques serve as a primary approach, including the implementation of k-anonymity and its evaluation using l-diversity and t-closeness. This study aims to evaluate the effectiveness of these techniques in reducing the risk of identity and attribute disclosure in a healthcare dataset. The case study utilizes a 55500 dataset medis containing the quasi-identifiers Age, Gender, and Blood Type, as well as the sensitive attribute Medical Condition. The dataset was anonymized using k-anonymity through generalization and suppression to form equivalence classes with a minimum size of k ? 5. Subsequently, the dataset was evaluated using l-diversity to measure the diversity of sensitive attributes within each group, and t-closeness to assess the similarity between the distribution of sensitive attributes in each group and the global distribution using Earth Mover’s Distance (EMD). The results indicate that all equivalence classes satisfy k ? 5 with a suppression rate of 1.15%. The l-diversity evaluation shows that no equivalence class has l < 2, thereby minimizing the risk of attribute disclosure. The t-closeness assessment reveals that the majority of classes have EMD ? 0.15, with only one class slightly exceeding the threshold of t = 0.2. In terms of data utility, the Normalized Generalized Information Loss (NGIL) value of 0.079 (7.9%) and an AECS of 6.28 indicate a low level of information loss without over-generalization. Overall, the combination of methods successfully achieves a balance between privacy protection and data utility, ensuring that the dataset remains suitable for further analysis and secondary data publication.

References

R. Krishna, K. Kelleher, and E. Stahlberg, “Patient Confidentiality in the Research Use of Clinical Medical Databases,” American Journal of Public Health, vol. 97, no. 4, pp. 654–658, Apr. 2007, doi: 10.2105/AJPH.2006.093682.

M. Barbaro and T. Zeller Jr., “A face is Exposed for AOL Searcher,” The New York Times. [Online]. Aug. 24, 2006. Available: https://www.nytimes.com/2006/08/09/technology/09aol.html. Accessed: Jan. 2026.

P. Samarati and L. Sweeney, “Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement Through Generalization and Suppression.” N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy Beyond K-Anonymity And L-Diversity,” dalam Proceedings of the 23rd International Conference on Data Engineering (ICDE), 2007, pp. 106–115, doi: 10.1109/ICDE.2007.367856.

K. El Emam and F. K. Dankar, “Protecting Privacy Using k-Anonymity,” Journal of the American Medical Informatics Association, vol. 15, no. 5, pp. 627–637, Sep. 2008, doi: 10.1197/jamia.M2716. K. El Emam and F. K. Dankar, “Protecting Privacy Using K-Anonymity,” Journal of the American Medical Informatics Association, vol. 15, no. 5, pp. 627–637, Sep. 2008, doi: 10.1197/jamia.M2716.

P. Ohm, “Broken Promises of Privacy: Responding to The Surprising Failure of Anonymization.”

A. Majeed and S. Lee, “Anonymization Techniques for Privacy Preserving Data Publishing: A Comprehensive Survey,” IEEE Access, vol. 9, pp. 8512–8545, 2021, doi: 10.1109/ACCESS.2020.3045700

C. Dwork and A. Roth, “The Algorithmic Foundations of Differential Privacy,” Foundations and Trends in Theoretical Computer Science, vol. 9, no. 3–4, pp. 211–487, 2013, doi: 10.1561/0400000042.

C. Dwork, A. Smith, T. Steinke, and J. Ullman, “Exposed! A survey of Attacks on Private Data,” Annual Review of Statistics and Its Application, vol. 4, no. 1, pp. 61–84, Mar. 2017, doi: 10.1146/annurev-statistics-060116-054123.

A. Majeed and S. Lee, “Anonymization Techniques for Privacy Preserving Data Publishing: A Comprehensive Survey,” IEEE Access, vol. 9, pp. 8512–8545, 2021, doi: 10.1109/ACCESS.2020.3045700.

Y. A. A. S. Aldeen, M. Salleh, and M. A. Razzaque, “A Comprehensive Review on Privacy Preserving Data Mining,” Springerplus, vol. 4, no. 1, pp. 1–36, Dec. 2015, doi: 10.1186/s40064-015-1481-x.

H. Lee, S. Kim, J. W. Kim, and Y. D. Chung, “Utility-Preserving Anonymization for Health Data Publishing,” BMC Medical Informatics and Decision Making, vol. 17, no. 1, p. 15, Jul. 2017, doi: 10.1186/s12911-017-0499-0.

V. S. Ivkova and I. R. Opirskyi, “Research of Existing Osint Tools and Approaches in The Context of Personal and State Information Security,” Computer systems and network, vol. 7, no. 1, pp. 131–142, Jun. 2025, doi: 10.23939/csn2025.01.131.

K. Oishi, Y. Sei, J. Andrew, Y. Tahara, and A. Ohsuga, “Algorithm to Satisfy L-Diversity by Combining Dummy Records and Grouping,” Security and Privacy, vol. 7, no. 3, p. e373, May 2024, doi: 10.1002/spy2.373.

M. Cunha, R. Mendes, and J. P. Vilela, “A Survey of Privacy-Preserving Mechanisms for Heterogeneous Data Types,” Computer Science Review, vol. 41, p. 100403, Aug. 2021, doi: 10.1016/j.cosrev.2021.100403.

A. Sepas, A. H. Bangash, O. Alraoui, K. El Emam, and A. El-Hussuna, “Algorithms to Anonymize Structured Medical and Healthcare Data: A Systematic Review,” Frontiers in Big Data, vol. 5, p. 984807, Oct. 2022, doi: 10.3389/fbinf.2022.984807.

K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, “Incognito: Efficient Full-Domain K-Anonymity,” in Proc. ACM SIGMOD International Conference on Management of Data, 2005, pp. 49–60.

EVALUASI EFEKTIVITAS TEKNIK PRIVACY-PRESERVING: K-ANONYMITY, L-DIVERSITY, T-CLOSENESS PADA DATA SENSITIF

EVALUATION OF THE EFFECTIVENESS OF PRIVACY-PRESERVING TECHNIQUES—K-ANONYMITY, L-DIVERSITY, AND T-CLOSENESS—ON SENSITIVE DATA

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Sidebarmenu

Make a Submission

Download Template Artikel

Statistik Pengunjung

Software Kutipan & Pengecekan

Indeks Jurnal

Information

Keywords