ENSEMBLE OF MULTIPLE kNN CLASSIFIERS FOR SOCIETAL RISK CLASSIFICATION

Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations： string representation, term-frequency representation, TF-IDF representation an...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of systems science and systems engineering Vol. 26; no. 4; pp. 433 - 447
Main Authors:	Chen, Jindong, Tang, Xijin
Format:	Journal Article
Language:	English
Published:	Berlin/Heidelberg Springer Berlin Heidelberg 01-08-2017 Springer Nature B.V Academy of Mathematics and Systems Science, Chinese Academy of Sciences,Beijing, 100190, P.R.China China Aerospace Academy of Systems Science and Engineering, Beijing, 100048, P.R.China%Academy of Mathematics and Systems Science, Chinese Academy of Sciences,Beijing, 100190, P.R.China
Subjects:	Classification Classifiers Complexity Economic Theory/Quantitative Economics/Mathematical Methods Engineering Forum KNN算法 Neural networks Operations Research/Decision Theory Representations Risk perception 余弦相似度分类器集成字符串表社会风险神经网络模型风险分类 Paragraph Vector k-Nearest Neighbor ensemble Tianya Forum Societal risk classification
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations： string representation, term-frequency representation, TF-IDF representation and the distributed representation of BBS posts are applied. Using edit distance or cosine similarity as distance metric, four k-Nearest Neighbor （kNN） classifiers based on different representations are developed and compared. Owing to the priority of word order and semantic extraction of the neural network model Paragraph Vector, kNN based on the distributed representation generated by Paragraph Vector （kNN-PV） shows effectiveness for societal risk classification. Furthermore, to improve the performance of societal risk classification, through different weights, kNN-PV is combined with other three kNN classifiers as an ensemble model. Through brute force grid search method, the optimal weights are assigned to different kNN classifiers. Compared with kNN-PV, the experimental results reveal that Macro-F of the ensemble method is significantly improved for societal risk classification.
Bibliography:	Societal risk classification is a fundamental and complex issue for societal risk perception. To conduct societal risk classification, Tianya Forum posts are selected as the data source, and four kinds of representations： string representation, term-frequency representation, TF-IDF representation and the distributed representation of BBS posts are applied. Using edit distance or cosine similarity as distance metric, four k-Nearest Neighbor （kNN） classifiers based on different representations are developed and compared. Owing to the priority of word order and semantic extraction of the neural network model Paragraph Vector, kNN based on the distributed representation generated by Paragraph Vector （kNN-PV） shows effectiveness for societal risk classification. Furthermore, to improve the performance of societal risk classification, through different weights, kNN-PV is combined with other three kNN classifiers as an ensemble model. Through brute force grid search method, the optimal weights are assigned to different kNN classifiers. Compared with kNN-PV, the experimental results reveal that Macro-F of the ensemble method is significantly improved for societal risk classification. 11-2983/N Societal risk classification, Tianya Forum, k-Nearest Neighbor, ensemble, Paragraph Vector
ISSN:	1004-3756 1861-9576
DOI:	10.1007/s11518-017-5346-4