The Impact of Feature Extraction and Selection on SMS Spam Filtering
This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in two different languages, namely Turkish and English. The entire feature set of filtering framework consists of the features originated from t...
Saved in:
Published in: | Elektronika ir elektrotechnika Vol. 19; no. 5; p. 67 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
Kaunas University of Technology, Faculty of Telecommunications and Electronics
01-01-2013
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in two different languages, namely Turkish and English. The entire feature set of filtering framework consists of the features originated from the bag-of-words (BoW) model along with the ensemble of structural features (SF) specific to spam problem. The distinctive BoW features are identified using information theoretic feature selection methods. Various combinations of the BoW and SF are then fed into widely used pattern classification algorithms to classify SMS messages. The filtering framework is evaluated on both Turkish and English SMS message datasets. For this purpose, as part of the study, the first publicly available Turkish SMS message collection is constituted as well. Comprehensive experimental analysis on the respective datasets revealed that the combinations of BoW and SFs, rather than BoW features alone, provide better classification performance on both datasets. Effectiveness of the utilized feature selection methods however slightly differs in each language. Index Terms--Feature extraction, feature selection, SMS, spam filter. |
---|---|
AbstractList | This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in two different languages, namely Turkish and English. The entire feature set of filtering framework consists of the features originated from the bag-of-words (BoW) model along with the ensemble of structural features (SF) specific to spam problem. The distinctive BoW features are identified using information theoretic feature selection methods. Various combinations of the BoW and SF are then fed into widely used pattern classification algorithms to classify SMS messages. The filtering framework is evaluated on both Turkish and English SMS message datasets. For this purpose, as part of the study, the first publicly available Turkish SMS message collection is constituted as well. Comprehensive experimental analysis on the respective datasets revealed that the combinations of BoW and SFs, rather than BoW features alone, provide better classification performance on both datasets. Effectiveness of the utilized feature selection methods however slightly differs in each language. Index Terms--Feature extraction, feature selection, SMS, spam filter. |
Audience | Academic |
Author | Gunal, S. Sora Gunal, E. Uysal, A. K. Ergin, S. |
Author_xml | – sequence: 1 givenname: A. K. surname: Uysal fullname: Uysal, A. K. – sequence: 2 givenname: S. surname: Gunal fullname: Gunal, S. – sequence: 3 givenname: S. surname: Ergin fullname: Ergin, S. – sequence: 4 givenname: E. surname: Sora Gunal fullname: Sora Gunal, E. |
BookMark | eNotkN1KAzEQhYNUsNY-gHd5gV0zSbO7uSy11ULFi63XIU0mNbI_JbuCvr1ZKjMwzIFzOHz3ZNb1HRLyCCyXpZRPXwxyRMxB5TKHiqsbMueMq0yWAmZkDkLxDDjIO7IchnBiwDgXAlZz8nz8RLpvL8aOtPd0h2b8jki3P2NMUug7ajpHa2zw-qWt32paX0xLd6EZMYbu_EBuvWkGXP7fBfnYbY-b1-zw_rLfrA-ZFbwcM8VLVVWGKW64lQoNOqe49YIXErwTalXZ4uSrwhlnrVfGo5V2JQQrC-atFAuSX3PPpkEdOt9PLdM4bINNTHxI-lpMgUVRiGSAq8HGfhgien2JoTXxVwPTEzqd0OmEToPSUk_oxB-OdGPS |
CitedBy_id | crossref_primary_10_1080_00051144_2021_1922150 crossref_primary_10_1016_j_eswa_2015_08_050 crossref_primary_10_1007_s10489_020_01937_4 crossref_primary_10_1016_j_comnet_2021_108453 crossref_primary_10_1016_j_asoc_2022_109438 crossref_primary_10_24017_science_2019_2_11 crossref_primary_10_3390_app10145011 crossref_primary_10_1007_s42452_019_1153_5 crossref_primary_10_1016_j_future_2019_09_001 crossref_primary_10_29109_http_gujsc_gazi_edu_tr_372880 crossref_primary_10_18038_aubtda_270276 crossref_primary_10_33793_acperpro_05_03_17783 crossref_primary_10_1002_cpe_6909 crossref_primary_10_1007_s13369_021_06187_1 crossref_primary_10_1080_10919392_2023_2210049 crossref_primary_10_1049_iet_sen_2018_5046 crossref_primary_10_35377_saucis_03_03_735463 |
ContentType | Journal Article |
Copyright | COPYRIGHT 2013 Kaunas University of Technology, Faculty of Telecommunications and Electronics |
Copyright_xml | – notice: COPYRIGHT 2013 Kaunas University of Technology, Faculty of Telecommunications and Electronics |
DBID | AAYXX CITATION |
DOI | 10.5755/j01.eee.19.5.1829 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2029-5731 |
ExternalDocumentID | A332656663 10_5755_j01_eee_19_5_1829 |
GeographicLocations | Turkey |
GeographicLocations_xml | – name: Turkey |
GroupedDBID | .4S .DC 5GY AAYXX AENEX ALMA_UNASSIGNED_HOLDINGS ARCSS CITATION EBS EDO EJD EN8 EOJEC GROUPED_DOAJ I-F IAO ITC L8X MK~ ML~ OBODZ OK1 P2P TUS |
ID | FETCH-LOGICAL-c327t-927988a092a2c59eaedd92cf32651fd3948c6bf86dadccf9afec5c4330760fc53 |
ISSN | 1392-1215 |
IngestDate | Wed Oct 25 08:53:58 EDT 2023 Fri Aug 23 02:55:31 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 5 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c327t-927988a092a2c59eaedd92cf32651fd3948c6bf86dadccf9afec5c4330760fc53 |
OpenAccessLink | https://doi.org/10.5755/j01.eee.19.5.1829 |
ParticipantIDs | gale_infotracacademiconefile_A332656663 crossref_primary_10_5755_j01_eee_19_5_1829 |
PublicationCentury | 2000 |
PublicationDate | 2013-01-01 |
PublicationDateYYYYMMDD | 2013-01-01 |
PublicationDate_xml | – month: 01 year: 2013 text: 2013-01-01 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | Elektronika ir elektrotechnika |
PublicationYear | 2013 |
Publisher | Kaunas University of Technology, Faculty of Telecommunications and Electronics |
Publisher_xml | – name: Kaunas University of Technology, Faculty of Telecommunications and Electronics |
SSID | ssib010223314 ssj0057036 |
Score | 2.1311235 |
Snippet | This paper investigates the impact of several feature extraction and feature selection approaches on filtering of short message service (SMS) spam messages in... |
SourceID | gale crossref |
SourceType | Aggregation Database |
StartPage | 67 |
SubjectTerms | Control Electronic data processing Methods Spam (Junk email) |
Title | The Impact of Feature Extraction and Selection on SMS Spam Filtering |
Volume | 19 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3da9swEBdr9rI-jLVrWfZR9FAY1NiLJSu2Hsvq0lLYS1rYm5BlqXTZ0pIm0P73u7NsOf2C9qEQTHKEi6375T50pztCdtOscilzMk611nFmXRpLPQKBWMOqio0NN7g1cDTJf_0uDsqs7DO6Pe1VJQ00kDWenH2BtANTIMB7kDlcQepwfbbcj8PRR_TwMEVQ3izm3VRwrNVspt80viL8vSGWn1zpf9HhBabOO1vWbdf_tdNmTs5URxfzyPqPvvXrNOj0s9tr7dP9SXSShKKe5cxTJ4FUzs9914KeNAEURuGrZbK6EYFDIcJGhNed4GrF2KzCm5aGxrC6RuStou8UrlwBlnhMj4MPiS0v_ozSxFqbpDIRCQRCsjdaXaL-ni0LFYYQ2yATBSwUsFCpVEIhizXyloFOWo2-QfVg3MubRv7eiDd9yZpYvX0mnxBHlj8e3NUdl2bQFVW2LsrpB_K-jS3ovgfFBnljZ5tkfaXj5EdyAPCgHh700tEWHrSHBwV40AAPCi-AB0V40ACPLXJ2WJ7-PIrbQRqx4SxfxJJhVzo9kkwzI6TVtq4lMw5cd5G6msusMOPKFeNa18Y4qZ01wmScY9rWGcG3yWB2ObOfCC14YcEuVBJWKMuNAfcV3B2rnchtnhdmSPa6tVBXvl-KelIWQ_IdV0uh6PExdXskBH4Ku5KpfY53CAE2__wStl_Iux6fX8lgMV_ab2Ttul7uNEL_D5xTaQU |
link.rule.ids | 315,782,786,866,27933,27934 |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+Impact+of+Feature+Extraction+and+Selection+on+SMS+Spam+Filtering&rft.jtitle=Elektronika+ir+elektrotechnika&rft.au=Uysal%2C+A.+K.&rft.au=Gunal%2C+S.&rft.au=Ergin%2C+S.&rft.au=Sora+Gunal%2C+E.&rft.date=2013-01-01&rft.issn=1392-1215&rft.eissn=2029-5731&rft.volume=19&rft.issue=5&rft_id=info:doi/10.5755%2Fj01.eee.19.5.1829&rft.externalDBID=n%2Fa&rft.externalDocID=10_5755_j01_eee_19_5_1829 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1392-1215&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1392-1215&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1392-1215&client=summon |