Email Classification: A Case Study

Internet dependance on email has been frequent since its early days. In the present days, electronic mail is widely used in a professional and personal context. Although this service was developed as a way of communication, nowadays it serves many other purposes. The majority of services available o...

Full description

Saved in:
Bibliographic Details
Main Author: da Silva, André Ricardo Azevedo Gonçalves
Format: Dissertation
Language:English
Published: ProQuest Dissertations & Theses 01-01-2016
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Internet dependance on email has been frequent since its early days. In the present days, electronic mail is widely used in a professional and personal context. Although this service was developed as a way of communication, nowadays it serves many other purposes. The majority of services available online will require an email address in order to authenticate or as a bridge of communication between the user and the service. The average number of emails sent and received, by the average user, is in the order of the hundreds per day, and these emails can be of varying categories: social, professional, notifications, marketing, transactional, emails which warrant no response, emails to send files, emails requiring response, among others with different purposes. This originates an information overload problem, that proves difficult to be completely solved manually by the email address owner. Therefore, there is a growing need to develop systems that can automatically learn and recommend users effective ways to organize their email information, which can aggregate emails into smaller groups to be easily interpreted by the user, expediting the process of reading and consulting the mailbox. To alleviate this information overload problem there are several possible approaches and techniques, such as machine learning to help on email classification and clustering, in order to find new subsets of emails in the massive inboxes we all have, now or in the future. After a careful review of the state of the art on different email text classification approaches, this work elaborates on a modular system that is capable of several preprocessing configurations and takes advantage of a classifiers ensemble, in order to better solve the problem of email classification. Afterwards, the system is be adapted to a very concrete case study, a desktop email client under development at Mailcube Lda. The case study tests and analyse different pre-processing configurations using three text classifiers for several users mailboxes from the Enron Corpus dataset. The final results are compared with work from the scientific community with identical configuration as validation. At the end, the resulting system is expected to suggest the user to organize his inbox into relevant groups of emails, based on learning users' interactions and continuously adapting to the arrival of new emails, improving the overall user experience and saving precious time for the users.
ISBN:9798358444614