Text Compression for Sentiment Analysis via Evolutionary Algorithms
Can textual data be compressed intelligently without losing accuracy in evaluating sentiment? In this study, we propose a novel evolutionary compression algorithm, PARSEC (PARts-of-Speech for sEntiment Compression), which makes use of Parts-of-Speech tags to compress text in a way that sacrifices mi...
Saved in:
Main Authors: | , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
20-09-2017
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Can textual data be compressed intelligently without losing accuracy in
evaluating sentiment? In this study, we propose a novel evolutionary
compression algorithm, PARSEC (PARts-of-Speech for sEntiment Compression),
which makes use of Parts-of-Speech tags to compress text in a way that
sacrifices minimal classification accuracy when used in conjunction with
sentiment analysis algorithms. An analysis of PARSEC with eight commercial and
non-commercial sentiment analysis algorithms on twelve English sentiment data
sets reveals that accurate compression is possible with (0%, 1.3%, 3.3%) loss
in sentiment classification accuracy for (20%, 50%, 75%) data compression with
PARSEC using LingPipe, the most accurate of the sentiment algorithms. Other
sentiment analysis algorithms are more severely affected by compression. We
conclude that significant compression of text data is possible for sentiment
analysis depending on the accuracy demands of the specific application and the
specific sentiment analysis algorithm used. |
---|---|
DOI: | 10.48550/arxiv.1709.06990 |