Privacy-preserving record linkage on large real world datasets

J Biomed Inform. 2014 Aug:50:205-12. doi: 10.1016/j.jbi.2013.12.003. Epub 2013 Dec 9.

Abstract

Record linkage typically involves the use of dedicated linkage units who are supplied with personally identifying information to determine individuals from within and across datasets. The personally identifying information supplied to linkage units is separated from clinical information prior to release by data custodians. While this substantially reduces the risk of disclosure of sensitive information, some residual risks still exist and remain a concern for some custodians. In this paper we trial a method of record linkage which reduces privacy risk still further on large real world administrative data. The method uses encrypted personal identifying information (bloom filters) in a probability-based linkage framework. The privacy preserving linkage method was tested on ten years of New South Wales (NSW) and Western Australian (WA) hospital admissions data, comprising in total over 26 million records. No difference in linkage quality was found when the results were compared to traditional probabilistic methods using full unencrypted personal identifiers. This presents as a possible means of reducing privacy risks related to record linkage in population level research studies. It is hoped that through adaptations of this method or similar privacy preserving methods, risks related to information disclosure can be reduced so that the benefits of linked research taking place can be fully realised.

Keywords: Bloom filters; Data integration; Population based research; Privacy preserving protocols; Privacy preserving record linkage; Record linkage.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Security*
  • Datasets as Topic*
  • Medical Record Linkage*
  • Privacy*
  • Western Australia