Improving Thai educational Web page classification using inverse class frequency

Automatic text classification for a Web collection is a challenge task, especially in the case that the language is not English, such as Thai. However, most of Thai educational Web pages usually include English terms due to their technical aspect. Lots of technical terms and typing errors both in Th...

Full description

Saved in:
Bibliographic Details
Published in:IEEE International Symposium on Communications and Information Technology, 2005. ISCIT 2005 Vol. 2; pp. 817 - 820
Main Authors: Lertnattee, V., Theeramunkong, T.
Format: Conference Proceeding
Language:English
Published: IEEE 2005
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Automatic text classification for a Web collection is a challenge task, especially in the case that the language is not English, such as Thai. However, most of Thai educational Web pages usually include English terms due to their technical aspect. Lots of technical terms and typing errors both in Thai and in English are found in Web sites of universities. Most previous works on text categorization applied term frequency and inverse document frequency for representing importance of terms. In this paper, we use inverse class frequency instead of inverse document frequency in centroid-based text categorization because it works well on a collection with a large number of unique terms. The experimental results show that inverse class frequency is useful, especially when it is applied on both prototype and query vectors.
AbstractList Automatic text classification for a Web collection is a challenge task, especially in the case that the language is not English, such as Thai. However, most of Thai educational Web pages usually include English terms due to their technical aspect. Lots of technical terms and typing errors both in Thai and in English are found in Web sites of universities. Most previous works on text categorization applied term frequency and inverse document frequency for representing importance of terms. In this paper, we use inverse class frequency instead of inverse document frequency in centroid-based text categorization because it works well on a collection with a large number of unique terms. The experimental results show that inverse class frequency is useful, especially when it is applied on both prototype and query vectors.
Author Lertnattee, V.
Theeramunkong, T.
Author_xml – sequence: 1
  givenname: V.
  surname: Lertnattee
  fullname: Lertnattee, V.
  organization: Fac. of Pharmacy, Silpakorn Univ., Nakorn Pathom, Thailand
– sequence: 2
  givenname: T.
  surname: Theeramunkong
  fullname: Theeramunkong, T.
BookMark eNotj9tKw0AYhBdUUGteQG_2BRL3lD1cSlAbKCgY8bJsNv_WlXRTs02hb2-lGQbm4hsG5hZdxiECQveUFJQS81h_VHVTMELKgpZSGsMuUGaUJidzU3JNr1GW0g85iRuuJL1B7_V2Nw6HEDe4-bYBQzc5uw9DtD3-ghbv7Aaw621KwYczwVP6r4d4gDHNEPsRfieI7niHrrztE2RzLtDny3NTLfPV22tdPa3yQFW5z71wXBsLpW0ZeEoUo57o1mnJhe6o6pxkUkkhuCDQMqM7qZlgTDpipQbDF-jhvBsAYL0bw9aOx_X8m_8BnPtQMw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISCIT.2005.1566992
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Statistics
EndPage 820
ExternalDocumentID 1566992
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AARBI
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
OCL
RIE
RIL
ID FETCH-LOGICAL-i175t-f4c389ae5ab2ef10721f08bc86348d17dc6267644340eb298d6824226c0a68e93
IEDL.DBID RIE
ISBN 9780780395381
0780395387
IngestDate Wed Jun 26 19:20:54 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-f4c389ae5ab2ef10721f08bc86348d17dc6267644340eb298d6824226c0a68e93
PageCount 4
ParticipantIDs ieee_primary_1566992
PublicationCentury 2000
PublicationDate 20050000
PublicationDateYYYYMMDD 2005-01-01
PublicationDate_xml – year: 2005
  text: 20050000
PublicationDecade 2000
PublicationTitle IEEE International Symposium on Communications and Information Technology, 2005. ISCIT 2005
PublicationTitleAbbrev ISCIT
PublicationYear 2005
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000393761
Score 1.3654705
Snippet Automatic text classification for a Web collection is a challenge task, especially in the case that the language is not English, such as Thai. However, most of...
SourceID ieee
SourceType Publisher
StartPage 817
SubjectTerms Bayesian methods
Electronic mail
Frequency
Natural languages
Prototypes
Statistics
Support vector machine classification
Support vector machines
Text categorization
Web pages
Title Improving Thai educational Web page classification using inverse class frequency
URI https://ieeexplore.ieee.org/document/1566992
Volume 2
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVop7IAbRHf8sBIaJo4sT2XVu2CKrUItsofZ6iEUtSPgX-Pz0kDSCxscU6yLCex313eeybkNhM85Ua5yOQyiZgTJlJK8ShX_vXRUirJUTs8nvHHF_EwRJucu1oLAwCBfAb3eBn-5duV2WGprIe5hpR-wW1wKUqtVl1PQY2pT8lDZi58w3_IvDLY2bf7e9FMLHuT2WAyL0sqVa-_jlcJu8vo6H_jOibdb5kendYb0Ak5gKJNDn84DLZJC8Fk6cXcIdO6gkDnb2pJYU_uUO_0GTTFtYUahNPIHwoRirz4V7oskL1RBalbl_zrzy55Gg3ng3FUnagQLT1M2EaOGQ9QFGRKJ-D66I3mYqGNyFMmbJ9b4_Mb7iFSymKfckthc5Gg1tbEKhcg01PSLFYFnBEquI_FTFumgaVgZSYzIT3-SvJUOa3PSQcnavFRmmYsqjm6-Pv2JWkFT9RQ27gize16B9eksbG7m_CYvwDmqqSU
link.rule.ids 310,311,782,786,791,792,798,4055,4056,27935,54769
linkProvider IEEE
linkToHtml http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5RPIgXFTD-tgePTsbWre0ZIRCRkDCjN9J2nZKYYRAO_vf2dWNq4sXbupc0Tbe133v7vq8A1xFnIdMy83QsAo9mXHtSSubF0r4-SggpGGqHB1M2fuZ3PbTJuam0MMYYRz4zt3jp_uWnC73GUlkbcw0h7IK7E1EWi0KtVVVUUGVqk3KXm3PbsJ8yKy12Nu3ORjbji_Zw2h0mRVGl7PfXAStuf-nv_29kB9D6FuqRSbUFHcKWyRuw98NjsAF1hJOFG3MTJlUNgSSvck7Mht4h38iTUQRXF6IRUCODyEUIMuNfyDxH_kYZJNmyYGB_tuCx30u6A688U8GbW6Cw8jKqLUSRJpIqMFkH3dEynyvN45DytMNSbTMcZkFSSH2bdAuexjxAta32ZcyNCI-gli9ycwyEMxvzqUqpMjQ0qYhExIVFYEEcykypE2jiRM3eC9uMWTlHp3_fvoLdQfIwmo2G4_szqDuHVFfpOIfaark2F7D9ka4v3SP_AsiCp-c
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE+International+Symposium+on+Communications+and+Information+Technology%2C+2005.+ISCIT+2005&rft.atitle=Improving+Thai+educational+Web+page+classification+using+inverse+class+frequency&rft.au=Lertnattee%2C+V.&rft.au=Theeramunkong%2C+T.&rft.date=2005-01-01&rft.pub=IEEE&rft.isbn=9780780395381&rft.volume=2&rft.spage=817&rft.epage=820&rft_id=info:doi/10.1109%2FISCIT.2005.1566992&rft.externalDocID=1566992
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780780395381/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780780395381/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780780395381/sc.gif&client=summon&freeimage=true