Machine Learning-Based Text Classification Comparison: Turkish Language Context

The growth in textual data associated with the increased usage of online services and the simplicity of having access to these data has resulted in a rise in the number of text classification research papers. Text classification has a significant influence on several domains such as news categorizat...

Full description

Saved in:
Bibliographic Details
Published in:Applied sciences Vol. 13; no. 16; p. 9428
Main Authors: Alzoubi, Yehia Ibrahim, Topcu, Ahmet E., Erkaya, Ahmed Enis
Format: Journal Article
Language:English
Published: Basel MDPI AG 01-08-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The growth in textual data associated with the increased usage of online services and the simplicity of having access to these data has resulted in a rise in the number of text classification research papers. Text classification has a significant influence on several domains such as news categorization, the detection of spam content, and sentiment analysis. The classification of Turkish text is the focus of this work since only a few studies have been conducted in this context. We utilize data obtained from customers’ inquiries that come to an institution to evaluate the proposed techniques. Classes are assigned to such inquiries specified in the institution’s internal procedures. The Support Vector Machine, Naïve Bayes, Long Term-Short Memory, Random Forest, and Logistic Regression algorithms were used to classify the data. The performance of the various techniques was then analyzed after and before data preparation, and the results were compared. The Long Term-Short Memory technique demonstrated superior effectiveness in terms of accuracy, achieving an 84% accuracy rate, surpassing the best accuracy record of traditional techniques, which was 78% accuracy for the Support Vector Machine technique. The techniques performed better once the number of categories in the dataset was reduced. Moreover, the findings show that data preparation and coherence between the classes’ number and the number of training sets are significant variables influencing the techniques’ performance. The findings of this study and the text classification technique utilized may be applied to data in dialects other than Turkish.
ISSN:2076-3417
2076-3417
DOI:10.3390/app13169428