Profiling Entities over Time in the Presence of Unreliable Sources

To harness the rich amount of information available on the web today, many organizations aggregate public (and private) data to derive knowledge repositories for real-world entities. This paper aims to build historical profiles of real-world entities by integrating temporal records collected from di...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on knowledge and data engineering Vol. 29; no. 7; pp. 1522 - 1535
Main Authors: Li, Furong, Lee, Mong Li, Hsu, Wynne
Format: Journal Article
Language:English
Published: New York IEEE 01-07-2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:To harness the rich amount of information available on the web today, many organizations aggregate public (and private) data to derive knowledge repositories for real-world entities. This paper aims to build historical profiles of real-world entities by integrating temporal records collected from different sources. This problem is challenging not only because entities may change their attribute values over time, but also because information provided by the sources could be unreliable. In this paper, we present a new solution for profiling entities over time. To understand the evolution of entities, we describe a novel transition model which gives the probability that an entity will change to a particular attribute value after some time period. Next, a set of quality metrics are defined for the data sources to capture the exactness and timeliness of their provided values. The transition model and the quality metrics are then built into a source-aware temporal matching algorithm that can link temporal records to entities at the right time and augment entity profiles with correct values. Our suite of experiments demonstrate that the proposed approach is able to outperform the state-of-the-art techniques by constructing more complete and accurate profiles for entities.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2017.2684804