Semi-Automated Extraction of New Requirements from Online Reviews for Software Product Evolution

In order to improve and increase their utility, software products must evolve continually and incrementally to meet the new requirements of current and future users. Online reviews from users of the software provide a rich and readily available resource for discovering candidate new features for fut...

Full description

Saved in:

Bibliographic Details
Published in:	2018 25th Australasian Software Engineering Conference (ASWEC) pp. 31 - 40
Main Authors:	Buchan, Jim, Bano, Muneera, Zowghi, Didar, Volabouth, Phonephasouk
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-11-2018
Subjects:	Data mining Feature extraction Feature request Measurement Online reviews Ontologies Semantics Software Software product line Software requirements Support vector machines
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In order to improve and increase their utility, software products must evolve continually and incrementally to meet the new requirements of current and future users. Online reviews from users of the software provide a rich and readily available resource for discovering candidate new features for future software releases. However, it is challenging to manually analyze a large volume of potentially unstructured and noisy data to extract useful information to support software release planning decisions. This paper investigates machine learning techniques to automatically identify text that represents users' ideas for new features from their online reviews. A binary classification approach to categorize extracted text as either a feature or non-feature was evaluated experimentally. Three machine learning algorithms were evaluated in the experiments: Naïve Bayes (with multinomial and Bernoulli variants), Support Vector Machines (with linear and multinomial variants) and Logistic Regression. Variations on the configurations of k-fold cross validation, the use of n-grams and review sentiment were also experimentally evaluated. Based on binary classification of over a thousand separate reviews of two products, Trello and Jira, linear Support Vector Machines with review sentiment as an input, using n-gram (1,4) together with k-fold 10 cross validation gave the best performance. The results have confirmed the feasibility and accuracy of semi-automated extraction of candidate requirements from a large volume of unstructured and noisy online user reviews. The next steps planned are to experiment with machine supported grouping, prioritizing and visualizing the extracted features to best support release planners' work, as well as extending the sources of candidate requirements.
ISSN:	2377-5408
DOI:	10.1109/ASWEC.2018.00013