The Impact of Feature Extraction and Selection on SMS Spam Filtering

This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in two different languages, namely Turkish and English. The entire feature set of filtering framework consists of the features originated from t...

Full description

Saved in:
Bibliographic Details
Published in:Elektronika ir elektrotechnika Vol. 19; no. 5; p. 67
Main Authors: Uysal, A. K., Gunal, S., Ergin, S., Sora Gunal, E.
Format: Journal Article
Language:English
Published: Kaunas University of Technology, Faculty of Telecommunications and Electronics 01-01-2013
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in two different languages, namely Turkish and English. The entire feature set of filtering framework consists of the features originated from the bag-of-words (BoW) model along with the ensemble of structural features (SF) specific to spam problem. The distinctive BoW features are identified using information theoretic feature selection methods. Various combinations of the BoW and SF are then fed into widely used pattern classification algorithms to classify SMS messages. The filtering framework is evaluated on both Turkish and English SMS message datasets. For this purpose, as part of the study, the first publicly available Turkish SMS message collection is constituted as well. Comprehensive experimental analysis on the respective datasets revealed that the combinations of BoW and SFs, rather than BoW features alone, provide better classification performance on both datasets. Effectiveness of the utilized feature selection methods however slightly differs in each language. Index Terms--Feature extraction, feature selection, SMS, spam filter.
AbstractList This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in two different languages, namely Turkish and English. The entire feature set of filtering framework consists of the features originated from the bag-of-words (BoW) model along with the ensemble of structural features (SF) specific to spam problem. The distinctive BoW features are identified using information theoretic feature selection methods. Various combinations of the BoW and SF are then fed into widely used pattern classification algorithms to classify SMS messages. The filtering framework is evaluated on both Turkish and English SMS message datasets. For this purpose, as part of the study, the first publicly available Turkish SMS message collection is constituted as well. Comprehensive experimental analysis on the respective datasets revealed that the combinations of BoW and SFs, rather than BoW features alone, provide better classification performance on both datasets. Effectiveness of the utilized feature selection methods however slightly differs in each language. Index Terms--Feature extraction, feature selection, SMS, spam filter.
Audience Academic
Author Gunal, S.
Sora Gunal, E.
Uysal, A. K.
Ergin, S.
Author_xml – sequence: 1
  givenname: A. K.
  surname: Uysal
  fullname: Uysal, A. K.
– sequence: 2
  givenname: S.
  surname: Gunal
  fullname: Gunal, S.
– sequence: 3
  givenname: S.
  surname: Ergin
  fullname: Ergin, S.
– sequence: 4
  givenname: E.
  surname: Sora Gunal
  fullname: Sora Gunal, E.
BookMark eNotkN1KAzEQhYNUsNY-gHd5gV0zSbO7uSy11ULFi63XIU0mNbI_JbuCvr1ZKjMwzIFzOHz3ZNb1HRLyCCyXpZRPXwxyRMxB5TKHiqsbMueMq0yWAmZkDkLxDDjIO7IchnBiwDgXAlZz8nz8RLpvL8aOtPd0h2b8jki3P2NMUug7ajpHa2zw-qWt32paX0xLd6EZMYbu_EBuvWkGXP7fBfnYbY-b1-zw_rLfrA-ZFbwcM8VLVVWGKW64lQoNOqe49YIXErwTalXZ4uSrwhlnrVfGo5V2JQQrC-atFAuSX3PPpkEdOt9PLdM4bINNTHxI-lpMgUVRiGSAq8HGfhgien2JoTXxVwPTEzqd0OmEToPSUk_oxB-OdGPS
CitedBy_id crossref_primary_10_1080_00051144_2021_1922150
crossref_primary_10_1016_j_eswa_2015_08_050
crossref_primary_10_1007_s10489_020_01937_4
crossref_primary_10_1016_j_comnet_2021_108453
crossref_primary_10_1016_j_asoc_2022_109438
crossref_primary_10_24017_science_2019_2_11
crossref_primary_10_3390_app10145011
crossref_primary_10_1007_s42452_019_1153_5
crossref_primary_10_1016_j_future_2019_09_001
crossref_primary_10_29109_http_gujsc_gazi_edu_tr_372880
crossref_primary_10_18038_aubtda_270276
crossref_primary_10_33793_acperpro_05_03_17783
crossref_primary_10_1002_cpe_6909
crossref_primary_10_1007_s13369_021_06187_1
crossref_primary_10_1080_10919392_2023_2210049
crossref_primary_10_1049_iet_sen_2018_5046
crossref_primary_10_35377_saucis_03_03_735463
ContentType Journal Article
Copyright COPYRIGHT 2013 Kaunas University of Technology, Faculty of Telecommunications and Electronics
Copyright_xml – notice: COPYRIGHT 2013 Kaunas University of Technology, Faculty of Telecommunications and Electronics
DBID AAYXX
CITATION
DOI 10.5755/j01.eee.19.5.1829
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2029-5731
ExternalDocumentID A332656663
10_5755_j01_eee_19_5_1829
GeographicLocations Turkey
GeographicLocations_xml – name: Turkey
GroupedDBID .4S
.DC
5GY
AAYXX
AENEX
ALMA_UNASSIGNED_HOLDINGS
ARCSS
CITATION
EBS
EDO
EJD
EN8
EOJEC
GROUPED_DOAJ
I-F
IAO
ITC
L8X
MK~
ML~
OBODZ
OK1
P2P
TUS
ID FETCH-LOGICAL-c327t-927988a092a2c59eaedd92cf32651fd3948c6bf86dadccf9afec5c4330760fc53
ISSN 1392-1215
IngestDate Wed Oct 25 08:53:58 EDT 2023
Fri Aug 23 02:55:31 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c327t-927988a092a2c59eaedd92cf32651fd3948c6bf86dadccf9afec5c4330760fc53
OpenAccessLink https://doi.org/10.5755/j01.eee.19.5.1829
ParticipantIDs gale_infotracacademiconefile_A332656663
crossref_primary_10_5755_j01_eee_19_5_1829
PublicationCentury 2000
PublicationDate 2013-01-01
PublicationDateYYYYMMDD 2013-01-01
PublicationDate_xml – month: 01
  year: 2013
  text: 2013-01-01
  day: 01
PublicationDecade 2010
PublicationTitle Elektronika ir elektrotechnika
PublicationYear 2013
Publisher Kaunas University of Technology, Faculty of Telecommunications and Electronics
Publisher_xml – name: Kaunas University of Technology, Faculty of Telecommunications and Electronics
SSID ssib010223314
ssj0057036
Score 2.1311235
Snippet This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in...
SourceID gale
crossref
SourceType Aggregation Database
StartPage 67
SubjectTerms Control
Electronic data processing
Methods
Spam (Junk email)
Title The Impact of Feature Extraction and Selection on SMS Spam Filtering
Volume 19
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3da9swEBdr9rI-jLVrWfZR9FAY1NiLJSu2Hsvq0lLYS1rYm5BlqXTZ0pIm0P73u7NsOf2C9qEQTHKEi6375T50pztCdtOscilzMk611nFmXRpLPQKBWMOqio0NN7g1cDTJf_0uDsqs7DO6Pe1VJQ00kDWenH2BtANTIMB7kDlcQepwfbbcj8PRR_TwMEVQ3izm3VRwrNVspt80viL8vSGWn1zpf9HhBabOO1vWbdf_tdNmTs5URxfzyPqPvvXrNOj0s9tr7dP9SXSShKKe5cxTJ4FUzs9914KeNAEURuGrZbK6EYFDIcJGhNed4GrF2KzCm5aGxrC6RuStou8UrlwBlnhMj4MPiS0v_ozSxFqbpDIRCQRCsjdaXaL-ni0LFYYQ2yATBSwUsFCpVEIhizXyloFOWo2-QfVg3MubRv7eiDd9yZpYvX0mnxBHlj8e3NUdl2bQFVW2LsrpB_K-jS3ovgfFBnljZ5tkfaXj5EdyAPCgHh700tEWHrSHBwV40AAPCi-AB0V40ACPLXJ2WJ7-PIrbQRqx4SxfxJJhVzo9kkwzI6TVtq4lMw5cd5G6msusMOPKFeNa18Y4qZ01wmScY9rWGcG3yWB2ObOfCC14YcEuVBJWKMuNAfcV3B2rnchtnhdmSPa6tVBXvl-KelIWQ_IdV0uh6PExdXskBH4Ku5KpfY53CAE2__wStl_Iux6fX8lgMV_ab2Ttul7uNEL_D5xTaQU
link.rule.ids 315,782,786,866,27933,27934
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Impact+of+Feature+Extraction+and+Selection+on+SMS+Spam+Filtering&rft.jtitle=Elektronika+ir+elektrotechnika&rft.au=Uysal%2C+A.+K.&rft.au=Gunal%2C+S.&rft.au=Ergin%2C+S.&rft.au=Sora+Gunal%2C+E.&rft.date=2013-01-01&rft.issn=1392-1215&rft.eissn=2029-5731&rft.volume=19&rft.issue=5&rft_id=info:doi/10.5755%2Fj01.eee.19.5.1829&rft.externalDBID=n%2Fa&rft.externalDocID=10_5755_j01_eee_19_5_1829
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1392-1215&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1392-1215&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1392-1215&client=summon