Two-phase schema matching in real world relational databases

We propose a new approach to the problem of schema matching in relational databases that merges the hybrid and composite approach of combining multiple individual matching techniques. In particular, we propose assigning individual matchers to two categories, "strong" matchers that provide...

Full description

Saved in:

Bibliographic Details
Published in:	2008 IEEE 24th International Conference on Data Engineering Workshop pp. 290 - 296
Main Authors:	Bozovic, N., Vassalos, V.
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-04-2008
Subjects:	Availability Database systems Informatics Internet Machine learning Neural networks Ontologies Relational databases Training data Voting
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We propose a new approach to the problem of schema matching in relational databases that merges the hybrid and composite approach of combining multiple individual matching techniques. In particular, we propose assigning individual matchers to two categories, "strong" matchers that provide a priori higher quality matches, and "weak" matchers that may be more sensitive to the inputs and are less reliable but can still help generate some matches. Matching is correspondingly done in two phases, with strong "matches" being produced by strong matchers being combined using a simple voting combiner, and weak matchers providing additional evidence for attributes left unmatched (again using a voting combiner). We observe that, while many recent advances in schema matching (Madhavan et al., 2005) use composite schema matching and rely on the existence of training schemas to train combiners, in many real-world situations it is not feasible to employ learning techniques because of the unavailability of training data (i.e., schemas or instance data.) We hypothesize that "weak" matchers can often hurt overall accuracy if used in a "single-phase" composite matcher that does not employ learning techniques. We implement our two-stage approach in the ASED system and evaluate it using real life schemas. The experiments validate our hypothesis regarding the negative effect of "weak" matchers and also show ASID performs comparably to state of the art systems while requiring no training schemas. We also demonstrate the benefits of a simple documentation-based matcher. Our experimental data included schemas ranging from 20 to 120 attributes. Note that schemas with 120 attributes are as large or larger than other published evaluations of relational schema matching.
ISBN:	1424421616 9781424421619
DOI:	10.1109/ICDEW.2008.4498334