DocSemMap: Leveraging Textual Data Documentations for Mapping Structured Data Sets into Knowledge Graphs

Today, knowledge graphs have been proven to enable the efficient integration of heterogeneous data sets. An important step in creating such knowledge graphs is the mapping of the attributes of a data set to the knowledge graph's ontology. So far, numerous methods have been developed to support...

Full description

Saved in:
Bibliographic Details
Published in:2022 IEEE 16th International Conference on Semantic Computing (ICSC) pp. 209 - 216
Main Authors: Burgdorf, Andreas, Paulus, Alexander, Pomp, Andre, Meisen, Tobias
Format: Conference Proceeding
Language:English
Published: IEEE 01-01-2022
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Today, knowledge graphs have been proven to enable the efficient integration of heterogeneous data sets. An important step in creating such knowledge graphs is the mapping of the attributes of a data set to the knowledge graph's ontology. So far, numerous methods have been developed to support this mapping process by using both the schema information as well as the actual data values from a data set in conjunction with external knowledge bases or machine learning approaches. A third source of information, namely textual data documentations, has not yet been considered. In this paper, we present DocSemMap, a novel approach that utilizes textual data documentations of data sets as an additional source for the creation of semantic mappings. We train custom embeddings on the textual data documentations. Further, we utilize pre-trained embeddings that allow us to identify similarities between excerpts of the textual data documentations and descriptions of ontological concepts. Based on this, we build candidate sets of the best suitable concepts for mapping and finally use weighted similarity scores to identify the best fitting concept for each attribute of a data set. The evaluation of our approach outperforms existing approaches for semantic mapping but still has potential for improvement.
DOI:10.1109/ICSC52841.2022.00042