Self-training from labeled features for sentiment analysis

► Extended the idea of generalized expectation to acquire self-learned features. ► Learning a sentiment classification model from labeled features. ► Outperforms existing weakly-supervised approaches to sentiment classification. ► Automatically extract highly domain salient polarity words. Sentiment...

Full description

Saved in:
Bibliographic Details
Published in:Information processing & management Vol. 47; no. 4; pp. 606 - 616
Main Authors: He, Yulan, Zhou, Deyu
Format: Journal Article
Language:English
Published: Kidlington Elsevier Ltd 01-07-2011
Elsevier
Elsevier Science Ltd
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:► Extended the idea of generalized expectation to acquire self-learned features. ► Learning a sentiment classification model from labeled features. ► Outperforms existing weakly-supervised approaches to sentiment classification. ► Automatically extract highly domain salient polarity words. Sentiment analysis concerns about automatically identifying sentiment or opinion expressed in a given piece of text. Most prior work either use prior lexical knowledge defined as sentiment polarity of words or view the task as a text classification problem and rely on labeled corpora to train a sentiment classifier. While lexicon-based approaches do not adapt well to different domains, corpus-based approaches require expensive manual annotation effort. In this paper, we propose a novel framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labeled examples for automatical domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labeled examples and are used to train another classifier by constraining the model’s predictions on unlabeled instances. Experiments on both the movie-review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labeled documents.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2010.11.003