Evaluating semantic similarity and relatedness between concepts by combining taxonomic and non-taxonomic semantic features of WordNet and Wikipedia

Many applications in cognitive science and artificial intelligence utilize semantic similarity and relatedness to solve difficult tasks such as information retrieval, word sense disambiguation, and text classification. Previously, several approaches for evaluating concept similarity and relatedness...

Full description

Saved in:
Bibliographic Details
Published in:Information sciences Vol. 625; pp. 673 - 699
Main Authors: Hussain, Muhammad Jawad, Bai, Heming, Wasti, Shahbaz Hassan, Huang, Guangjian, Jiang, Yuncheng
Format: Journal Article
Language:English
Published: Elsevier Inc 01-05-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Many applications in cognitive science and artificial intelligence utilize semantic similarity and relatedness to solve difficult tasks such as information retrieval, word sense disambiguation, and text classification. Previously, several approaches for evaluating concept similarity and relatedness based on WordNet or Wikipedia have been proposed. WordNet-based methods rely on highly precise knowledge but have limited lexical coverage. In contrast, Wikipedia-based models achieve more coverage but sacrifice knowledge quality. Therefore, in this paper, we focus on developing a comprehensive semantic similarity and relatedness method based on WordNet and Wikipedia. To improve the accuracy of existing measures, we combine various taxonomic and non-taxonomic features of WordNet, including gloss, lemmas, examples, sister-terms, derivations, holonyms/meronyms, and hypernyms/hyponyms, with Wikipedia gloss and hyperlinks, to describe concepts. We present a novel technique for extracting ‘is-a’ and ‘part-whole’ relationships between concepts using the Wikipedia link structure. The suggested technique identifies taxonomic and non-taxonomic relationships between concepts and offers dense vector representations of concepts. To fully exploit WordNet and Wikipedia’s semantic attributes, the proposed method integrates their semantic knowledge at feature-level, combining semantic similarity and relatedness into a single comprehensive measure. The experimental results demonstrate the effectiveness of the proposed method over state-of-the-art measures on various gold standard benchmarks.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2023.01.007