Examining the Impact of Feature Selection on Classification of User Reviews in Web Pages
The user reviews in web pages can provide useful information about the content of the web page for text processing applications. Automatically extracting data from a web page is a crucial process for these applications. One of the used methods in this process is to construct a learning model with an...
Saved in:
Published in: | 2018 International Conference on Artificial Intelligence and Data Processing (IDAP) pp. 1 - 8 |
---|---|
Main Authors: | , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-09-2018
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The user reviews in web pages can provide useful information about the content of the web page for text processing applications. Automatically extracting data from a web page is a crucial process for these applications. One of the used methods in this process is to construct a learning model with an appropriate classification method using features that are derived from data. However, some features can be either redundant or irrelevant for this model. In this study, an imbalanced dataset including 47 shallow text features obtained from web pages is utilized for extracting of the user reviews. Then, various well-known feature selection techniques are applied to reduce the number of these features. The effects of this reduction on the classification methods are also examined. The experimental results indicate that approximately half of the features are sufficient for the classification task. Additionally, the AdaBoost classifier gives the best results concerning precision of about 0.930 for the review layout prediction. |
---|---|
DOI: | 10.1109/IDAP.2018.8620774 |