On discovering co-location patterns in datasets: a case study of pollutants and child cancers

We intend to identify relationships between cancer cases and pollutant emissions by proposing a novel co-location mining algorithm. In this context, we specifically attempt to understand whether there is a relationship between the location of a child diagnosed with cancer with any chemical combinati...

Full description

Saved in:

Bibliographic Details
Published in:	GeoInformatica Vol. 20; no. 4; pp. 651 - 692
Main Authors:	Li, Jundong, Adilmagambetov, Aibek, Mohomed Jabbar, Mohomed Shazan, Zaïane, Osmar R., Osornio-Vargas, Alvaro, Wine, Osnat
Format:	Journal Article
Language:	English
Published:	New York Springer US 01-10-2016 Springer Springer Nature B.V
Subjects:	Air pollution Algorithms Analysis Cancer Case studies Childrens health Computer Science Data mining Data points Data Structures and Information Theory Datasets Earth and Environmental Science Environmental health Geographical Information Systems/Cartography Geography Geospatial data Health aspects Information Storage and Retrieval Multimedia Information Systems Pattern analysis Pollutants Pollution studies Statistical tests Thresholds Co-location mining Air pollutant and environmental health Association rule and frequent pattern mining Uncertain data mining
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We intend to identify relationships between cancer cases and pollutant emissions by proposing a novel co-location mining algorithm. In this context, we specifically attempt to understand whether there is a relationship between the location of a child diagnosed with cancer with any chemical combinations emitted from various facilities in that particular location. Co-location pattern mining intends to detect sets of spatial features frequently located in close proximity to each other. Most of the previous works in this domain are based on transaction-free apriori-like algorithms which are dependent on user-defined thresholds, and are designed for boolean data points. Due to the absence of a clear notion of transactions, it is nontrivial to use association rule mining techniques to tackle the co-location mining problem. Our proposed approach is focused on a grid based transactionization? of the geographic space, and is designed to mine datasets with extended spatial objects. It is also capable of incorporating uncertainty of the existence of features to model real world scenarios more accurately. We eliminate the necessity of using a global threshold by introducing a statistical test to validate the significance of candidate co-location patterns and rules. Experiments on both synthetic and real datasets reveal that our algorithm can detect a considerable amount of statistically significant co-location patterns. In addition, we explain the data modelling framework which is used on real datasets of pollutants (PRTR/NPRI) and childhood cancer cases.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1384-6175 1573-7624
DOI:	10.1007/s10707-016-0254-1