Probabilistic Record Linkage of Hospital Patients - Chris Oakman

Probabilistic record linkage of hospital patients ensures accurate medical records, exploring challenges and best practices for matching patient records using IDs, approximate string matching and machine learning algorithms.

Key takeaways
  • Time is a useful dimension for matching records, as people are more likely to be in the hospital in recent timeframes than distant ones.
  • The data cleaning step is crucial and consumes most of the time when working on a record linkage project.
  • The speaker recommends trying different approaches, including a deterministic and probabilistic approach, to find the best matching algorithm.
  • The talker shares the algorithm used in the Luminaire system, which includes using IDs such as social security number, medical record number, and visit number to match patients.
  • The speaker emphasizes the importance of using approximate string matching algorithms, such as Levenshtein distance, with caution when working with IDs.
  • The algorithm used in the Luminaire system assigns a match score based on various fields, including name, date of birth, address, and medical record number, and then uses a threshold to determine if the records are a match.
  • The speaker highlights the importance of accuracy in medical records and notes that even a small mistake can have serious consequences.
  • The use of a machine learning algorithm to find matching weights is recommended, as it can be more accurate than manual selection.
  • The speaker shares his own experience working on a record linkage project and notes that it’s important to be creative and flexible when approaching the problem.
  • The importance of avoiding false positives is emphasized, and the speaker recommends using a range of thresholds to determine match scores.
  • The speaker also highlights the importance of human review and evaluation of matches to ensure accuracy.