Classification of Poverty Condition Using Natural Language Processing

This work introduces a methodology to classify between poor and extremely poor people through Natural Language Processing. The approach serves as a baseline to understand and classify poverty through the people’s discourses using machine learning algorithms. Based on classical and modern word vector...

Full description

Saved in:
Bibliographic Details
Published in:Social indicators research Vol. 162; no. 3; pp. 1413 - 1435
Main Authors: Muñetón-Santa, Guberney, Escobar-Grisales, Daniel, López-Pabón, Felipe Orlando, Pérez-Toro, Paula Andrea, Orozco-Arroyave, Juan Rafael
Format: Journal Article
Language:English
Published: Dordrecht Springer Netherlands 01-08-2022
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This work introduces a methodology to classify between poor and extremely poor people through Natural Language Processing. The approach serves as a baseline to understand and classify poverty through the people’s discourses using machine learning algorithms. Based on classical and modern word vector representations we propose two strategies for document level representations: (1) document-level features based on the concatenation of descriptive statistics and (2) Gaussian mixture models. Three classification methods are systematically evaluated: Support Vector Machines, Random Forest, and Extreme Gradient Boosting. The fourth best experiments yielded around 55% of accuracy, while the embeddings based on GloVe word vectors yielded a sensitivity of 79.6% which could be of great interest for the public policy makers to accurately find people who need to be prioritized in social programs.
ISSN:0303-8300
1573-0921
DOI:10.1007/s11205-022-02883-z