Hyppää sisältöön
    • Suomeksi
    • På svenska
    • In English
  • Suomi
  • Svenska
  • English
  • Kirjaudu
Hakuohjeet
JavaScript is disabled for your browser. Some features of this site may not work without it.
Näytä viite 
  •   Ammattikorkeakoulut
  • Turun ammattikorkeakoulu
  • Opinnäytetyöt (Avoin kokoelma)
  • Näytä viite
  •   Ammattikorkeakoulut
  • Turun ammattikorkeakoulu
  • Opinnäytetyöt (Avoin kokoelma)
  • Näytä viite

Assessing Linkage Risk in Pseudonymized Datasets Under Modern Machine Learning Algorithms

Acosta Der Megerdichian, Juan Sebastian (2026)

 
Avaa tiedosto
Acosta_Sebastian.pdf (458.2Kt)
Lataukset: 


Acosta Der Megerdichian, Juan Sebastian
2026
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Näytä kaikki kuvailutiedot
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202601302065
Tiivistelmä
​​Access to machine learning algorithms has expanded rapidly in recent years, lowering technical barriers and enabling a wider range of actors to perform advanced data analysis. At the same time, large datasets have become increasingly valuable for artificial intelligence training and scientific research, raising significant privacy concerns. In response, the European Union introduced the General Data Protection Regulation (GDPR) in 2016, which promotes pseudonymization as a safeguard to protect individual identities in shared datasets. ​However, record linkage — the process of identifying the same individuals across multiple datasets — can enable unintended re-identification, particularly when machine learning techniques are applied.​

​This study adopts an experimental approach to evaluate linkage risk in pseudonymized datasets under multiple conditions. ​It assesses the performance of ​several machine​ learning algorithms across three experimental scenarios: (1) evaluating linkage performance under different pseudonymization techniques, (2) measuring the effects of incremental pseudonymization applied step by step, and (3) testing whether models trained on pseudonymized data can successfully link records when a partially leaked, non-pseudonymized dataset becomes available.​

​The results indicate that even relatively simple machine learning models can achieve strong linkage performance across datasets. Increased cryptographic strength in pseudonymization techniques does not consistently correspond to reduced linkage capability, and in some cases pseudonymization appears to simplify data in ways that facilitate linkage. ​These results raise concerns about the long-term robustness of current pseudonymization practices in an environment where machine learning tools are widely accessible, datasets are increasingly reused and shared, and data continues to grow in both research and economic value.​​
Kokoelmat
  • Opinnäytetyöt (Avoin kokoelma)
Ammattikorkeakoulujen opinnäytetyöt ja julkaisut
Yhteydenotto | Tietoa käyttöoikeuksista | Tietosuojailmoitus | Saavutettavuusseloste
 

Selaa kokoelmaa

NimekkeetTekijätJulkaisuajatKoulutusalatAsiasanatUusimmatKokoelmat

Henkilökunnalle

Ammattikorkeakoulujen opinnäytetyöt ja julkaisut
Yhteydenotto | Tietoa käyttöoikeuksista | Tietosuojailmoitus | Saavutettavuusseloste