Data mining approach for dry bean seeds classification

•Data mining with an emphasis on principal component analysis.•Machine learning used to predict seed quality: random forest - RF, support vector machine - SVM and k-nearest neighbors - KNN.•Hyper parameter tuning in machine learning algorithms.•Dataset balancing based on synthetic minority super sam...

Full description

Saved in:
Bibliographic Details
Published in:Smart agricultural technology Vol. 5; p. 100240
Main Authors: Macuácua, Jaime Carlos, Centeno, Jorge António Silva, Amisse, Caísse
Format: Journal Article
Language:English
Published: Elsevier B.V 01-10-2023
Elsevier
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Data mining with an emphasis on principal component analysis.•Machine learning used to predict seed quality: random forest - RF, support vector machine - SVM and k-nearest neighbors - KNN.•Hyper parameter tuning in machine learning algorithms.•Dataset balancing based on synthetic minority super sampling -SMOTE and applied three machine learning techniques.•Dry bean grains. Product quality certification is an important process in agricultural production and productivity. Traditional methods for seed quality classification have shown limitations such as complex steps, low precision, and slow inspection for large production volumes. Automatic classification techniques based on machine learning and computer vision offer fast and high throughput solutions. Despite the major advances in state-of-the-art automatic classification models, there is still a need to improve these models by incorporating other techniques. In this article, we developed a computer vision system for the automatic classification of different seed varieties based on machine learning models, combined with data mining techniques using a set of features related to the geometry of bean seeds, extracted from binary images. Three machine learning techniques were compared, namely: Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN), including Principal Component Analysis (PCA), Hyperparameter tuning in machine learning algorithms, and dataset balancing based on Synthetic Minority Oversampling Technique (SMOTE). The results showed that data mining processes, such as Principal Component Analysis, Hyperparameter tuning, and application of the SMOTE technique, help to improve the quality of classification results. The KNN classifier showed better performance, with around 95% accuracy and 96% precision and recall. The best results were obtained applying hyperparameter tuning and the SMOTE technique, in the preprocessing step, obtaining an increase around 2.6%. The results proved that the combined use of data mining in the preprocessing step and machine learning classification methods can effectively and efficiently increase the classification accuracy and help automatic bean seed selection based on digital images. This can help small farmers and/or agricultural managers make decisions regarding seed selection to increase production.
ISSN:2772-3755
2772-3755
DOI:10.1016/j.atech.2023.100240