Examining the Impact of Feature Selection on Classification of User Reviews in Web Pages

The user reviews in web pages can provide useful information about the content of the web page for text processing applications. Automatically extracting data from a web page is a crucial process for these applications. One of the used methods in this process is to construct a learning model with an...

Full description

Saved in:
Bibliographic Details
Published in:2018 International Conference on Artificial Intelligence and Data Processing (IDAP) pp. 1 - 8
Main Authors: Uzun, Erdinc, Ozhan, Erkan
Format: Conference Proceeding
Language:English
Published: IEEE 01-09-2018
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The user reviews in web pages can provide useful information about the content of the web page for text processing applications. Automatically extracting data from a web page is a crucial process for these applications. One of the used methods in this process is to construct a learning model with an appropriate classification method using features that are derived from data. However, some features can be either redundant or irrelevant for this model. In this study, an imbalanced dataset including 47 shallow text features obtained from web pages is utilized for extracting of the user reviews. Then, various well-known feature selection techniques are applied to reduce the number of these features. The effects of this reduction on the classification methods are also examined. The experimental results indicate that approximately half of the features are sufficient for the classification task. Additionally, the AdaBoost classifier gives the best results concerning precision of about 0.930 for the review layout prediction.
DOI:10.1109/IDAP.2018.8620774