Creating Synthetic Geospatial Patient Data to Mimic Real Data Whilst Preserving Privacy: 2022 35th International Symposium on Computer-Based Medical Systems (CBMS)
Synthetic Individual-Level Geospatial Data (SIL-GSD) offers a number of advantages in Spatial Epidemiology when compared to census data or surveys conducted on regional or global levels. The use of SILGSD could bring a new dimension to the study of the patterns and causes of diseases in a particular...
Saved in:
Published in: | 2023 IEEE 36th International Symposium on Computer-Based Medical Systems (CBMS) pp. 7 - 12 |
---|---|
Main Authors: | , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-06-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Synthetic Individual-Level Geospatial Data (SIL-GSD) offers a number of advantages in Spatial Epidemiology when compared to census data or surveys conducted on regional or global levels. The use of SILGSD could bring a new dimension to the study of the patterns and causes of diseases in a particular location while minimizing the risk of patient identity disclosure, especially for rare conditions. Additionally, it could help in building and monitoring regional machine learning models, improving the quality and effectiveness of local healthcare services. Finally, SILGSD could help in controlling the spread and causes of diseases by studying disease movement across areas through the travelling patterns of populations. To our knowledge, no synthetic health records data containing synthesised geographic locations for patients has been published for research purposes so far. Therefore, in this paper we explore generating SILGSD by allocating synthetic patients to general practices (healthcare providers) in the UK using the demographics and prevalence of health conditions in each practice. The assigned general practice locations can be used as proxies for patient locations due to people being registered to their nearest practice from home. We use high-fidelity synthetic primary care patients from the Clinical Practice Research Datalink (CPRD) and allocate them to England's general practices (GPs), using the publicly available GP health conditions statistics from the Quality and Outcomes Framework (QOF). The allocation relies on similarities between patients in different locations without using real location information for the patients. We demonstrate that the Allocation Data is able to accurately mimic the real health conditions distribution in the general practices and also preserves the underlying distribution of the original primary care patients data from CPRD (Gold Standard). |
---|---|
ISSN: | 2372-9198 |
DOI: | 10.1109/CBMS58004.2023.00183 |