Radius-aware approximate blank node matching using signatures
In the linked open data cloud, the biggest open data graph that currently exists, a remarkable percentage of data are unnamed resources, also called blank nodes. Several fundamental tasks, such as graph isomorphism checking and RDF data versioning, require computing a map between the sets of blank n...
Saved in:
Published in: | Knowledge and information systems Vol. 50; no. 2; pp. 505 - 542 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
London
Springer London
01-02-2017
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the linked open data cloud, the biggest open data graph that currently exists, a remarkable percentage of data are
unnamed
resources, also called blank nodes. Several fundamental tasks, such as graph isomorphism checking and RDF data versioning, require computing a map between the sets of blank nodes of two graphs. This map aims at minimizing the delta size, i.e. the number of change operations that are required to make the graphs isomorphic. Computing the optimal map is NP-Hard in the general case, and various approximation algorithms have been proposed. In this work, we propose a novel radius-aware signature-based algorithm that is not restricted to the direct neighborhood of the compared blank nodes. Contrary to the older algorithms, the proposed algorithm manages to decrease the deviation from the optimal solution even for graphs that contain connected blank nodes in large and dense structures. The conducted experiments over real and synthetically generated datasets (including datasets from the Billion Triple Challenge 2012 and 2014) show the significantly smaller deltas. For isomorphism checking (simple RDF equivalence), with a wise configuration of radius, the proposed algorithm achieves optimality for
100
%
of the datasets, while in non-isomorphic datasets the deltas are on average 50–75 % smaller than those of the previous algorithms. Finally, the trade-off between radius, deviation from the optimum and time efficiency is analyzed. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0219-1377 0219-3116 |
DOI: | 10.1007/s10115-016-0945-9 |