ENSEMBLE OF MULTIPLE kNN CLASSIFIERS FOR SOCIETAL RISK CLASSIFICATION

Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations: string representation, term-frequency representation, TF-IDF representation an...

Full description

Saved in:
Bibliographic Details
Published in:Journal of systems science and systems engineering Vol. 26; no. 4; pp. 433 - 447
Main Authors: Chen, Jindong, Tang, Xijin
Format: Journal Article
Language:English
Published: Berlin/Heidelberg Springer Berlin Heidelberg 01-08-2017
Springer Nature B.V
Academy of Mathematics and Systems Science, Chinese Academy of Sciences,Beijing, 100190, P.R.China
China Aerospace Academy of Systems Science and Engineering, Beijing, 100048, P.R.China%Academy of Mathematics and Systems Science, Chinese Academy of Sciences,Beijing, 100190, P.R.China
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations: string representation, term-frequency representation, TF-IDF representation and the distributed representation of BBS posts are applied. Using edit distance or cosine similarity as distance metric, four k-Nearest Neighbor (kNN) classifiers based on different representations are developed and compared. Owing to the priority of word order and semantic extraction of the neural network model Paragraph Vector, kNN based on the distributed representation generated by Paragraph Vector (kNN-PV) shows effectiveness for societal risk classification. Furthermore, to improve the performance of societal risk classification, through different weights, kNN-PV is combined with other three kNN classifiers as an ensemble model. Through brute force grid search method, the optimal weights are assigned to different kNN classifiers. Compared with kNN-PV, the experimental results reveal that Macro-F of the ensemble method is significantly improved for societal risk classification.
Bibliography:Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations: string representation, term-frequency representation, TF-IDF representation and the distributed representation of BBS posts are applied. Using edit distance or cosine similarity as distance metric, four k-Nearest Neighbor (kNN) classifiers based on different representations are developed and compared. Owing to the priority of word order and semantic extraction of the neural network model Paragraph Vector, kNN based on the distributed representation generated by Paragraph Vector (kNN-PV) shows effectiveness for societal risk classification. Furthermore, to improve the performance of societal risk classification, through different weights, kNN-PV is combined with other three kNN classifiers as an ensemble model. Through brute force grid search method, the optimal weights are assigned to different kNN classifiers. Compared with kNN-PV, the experimental results reveal that Macro-F of the ensemble method is significantly improved for societal risk classification.
11-2983/N
Societal risk classification, Tianya Forum, k-Nearest Neighbor, ensemble, Paragraph Vector
ISSN:1004-3756
1861-9576
DOI:10.1007/s11518-017-5346-4