A Machine Learning Based Topic Exploration and Categorization on Surveys

This paper describes an automatic topic extraction, categorization, and relevance ranking model for multi-lingual surveys and questions that exploits machine learning algorithms such as topic modeling and fuzzy clustering. Automatically generated question and survey categories are used to build ques...

Full description

Saved in:
Bibliographic Details
Published in:2012 11th International Conference on Machine Learning and Applications Vol. 2; pp. 7 - 12
Main Authors: George, C. P., Wang, D. Z., Wilson, J. N., Epstein, L. M., Garland, P., Suh, A.
Format: Conference Proceeding
Language:English
Published: IEEE 01-12-2012
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper describes an automatic topic extraction, categorization, and relevance ranking model for multi-lingual surveys and questions that exploits machine learning algorithms such as topic modeling and fuzzy clustering. Automatically generated question and survey categories are used to build question banks and category-specific survey templates. First, we describe different pre-processing steps we considered for removing noise in the multilingual survey text. Second, we explain our strategy to automatically extract survey categories from surveys based on topic models. Third, we describe different methods to cluster questions under survey categories and group them based on relevance. Last, we describe our experimental results on a large group of unique, real-world survey datasets from the German, Spanish, French, and Portuguese languages and our refining methods to determine meaningful and sensible categories for building question banks. We conclude this document with possible enhancements to the current system and impacts in the business domain.
ISBN:1467346519
9781467346511
DOI:10.1109/ICMLA.2012.132