Detecting spatio-temporal outliers with kernels and statistical testing
Outlier detection is the discovery of points that are exceptional when compared with a set of observations that are considered normal. Such points are important since they often lead to the discovery of exceptional events. In spatio-temporal data, observations are vectors of feature values, tagged w...
Saved in:
Published in: | 2009 17th International Conference on Geoinformatics pp. 1 - 6 |
---|---|
Main Authors: | , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-08-2009
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Outlier detection is the discovery of points that are exceptional when compared with a set of observations that are considered normal. Such points are important since they often lead to the discovery of exceptional events. In spatio-temporal data, observations are vectors of feature values, tagged with a geographical location and a timestamp. A spatio-temporal outlier is an observation whose attribute values are significantly different from those of other spatially and temporally referenced objects in a spatio-temporal neighborhood. It represents an object that is significantly different from its neighbors, even though it may not be significantly different from the entire population. The discovery of outliers in spatio-temporal data is then complicated by the fact that one needs to focus the search on appropriate spatio-temporal neighborhoods of points. The work in this paper leverages an algorithm, StrOUD (strangeness-based outlier detection algorithm), that has been developed and used by the authors to detect outliers in various scenarios (including vector spaces and non-vectorial data). StrOUD uses a measure of strangeness to categorize an observation, and compares the strangeness of a point with the distribution of strangeness of a set of baseline observations (which are assumed to be mostly from normal points). Using statistical testing, StrOUD determines if the point is an outlier or not. The technique described in this paper defines strangeness as the sum of distances to nearest neighbors, where the distance between two observations is computed as a weighted combination of the distance between their vectors of features, their geographical distance, and their temporal distance. Using this multi-modal distance measure (thereby called kernel), our technique is able to diagnose outliers with respect to spatio-temporal neighborhoods. We show how our approach is capable of determining outliers in real-life data, including crime data, and a set of observations collected by buoys in the Gulf of Mexico during the 2005 hurricane season. We show that the use of different weightings on the kernel distances allows the user to adapt the size of spatio-temporal neighborhoods. |
---|---|
ISBN: | 1424445620 9781424445622 |
ISSN: | 2161-024X |
DOI: | 10.1109/GEOINFORMATICS.2009.5293481 |