Data-Driven Answer Selection in Community QA Systems

Finding similar questions from historical archives has been applied to question answering, with well theoretical underpinnings and great practical success. Nevertheless, each question in the returned candidate pool often associates with multiple answers, and hence users have to painstakingly browse...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on knowledge and data engineering Vol. 29; no. 6; pp. 1186 - 1198
Main Authors:	Nie, Liqiang, Wei, Xiaochi, Zhang, Dongxiang, Wang, Xiang, Gao, Zhipeng, Yang, Yi
Format:	Journal Article
Language:	English
Published:	New York IEEE 01-06-2017 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Analytical models answer selection Archives & records Candidates Closed-form solutions Communities Community-based question answering Computer science Feature extraction Knowledge discovery observation-guided training set construction Portals Training
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Finding similar questions from historical archives has been applied to question answering, with well theoretical underpinnings and great practical success. Nevertheless, each question in the returned candidate pool often associates with multiple answers, and hence users have to painstakingly browse a lot before finding the correct one. To alleviate such problem, we present a novel scheme to rank answer candidates via pairwise comparisons. In particular, it consists of one offline learning component and one online search component. In the offline learning component, we first automatically establish the positive, negative, and neutral training samples in terms of preference pairs guided by our data-driven observations. We then present a novel model to jointly incorporate these three types of training samples. The closed-form solution of this model is derived. In the online search component, we first collect a pool of answer candidates for the given question via finding its similar questions. We then sort the answer candidates by leveraging the offline trained model to judge the preference orders. Extensive experiments on the real-world vertical and general community-based question answering datasets have comparatively demonstrated its robustness and promising performance. Also, we have released the codes and data to facilitate other researchers.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2017.2669982