Improving Thai educational Web page classification using inverse class frequency
Automatic text classification for a Web collection is a challenge task, especially in the case that the language is not English, such as Thai. However, most of Thai educational Web pages usually include English terms due to their technical aspect. Lots of technical terms and typing errors both in Th...
Saved in:
Published in: | IEEE International Symposium on Communications and Information Technology, 2005. ISCIT 2005 Vol. 2; pp. 817 - 820 |
---|---|
Main Authors: | , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
2005
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | Automatic text classification for a Web collection is a challenge task, especially in the case that the language is not English, such as Thai. However, most of Thai educational Web pages usually include English terms due to their technical aspect. Lots of technical terms and typing errors both in Thai and in English are found in Web sites of universities. Most previous works on text categorization applied term frequency and inverse document frequency for representing importance of terms. In this paper, we use inverse class frequency instead of inverse document frequency in centroid-based text categorization because it works well on a collection with a large number of unique terms. The experimental results show that inverse class frequency is useful, especially when it is applied on both prototype and query vectors. |
---|---|
AbstractList | Automatic text classification for a Web collection is a challenge task, especially in the case that the language is not English, such as Thai. However, most of Thai educational Web pages usually include English terms due to their technical aspect. Lots of technical terms and typing errors both in Thai and in English are found in Web sites of universities. Most previous works on text categorization applied term frequency and inverse document frequency for representing importance of terms. In this paper, we use inverse class frequency instead of inverse document frequency in centroid-based text categorization because it works well on a collection with a large number of unique terms. The experimental results show that inverse class frequency is useful, especially when it is applied on both prototype and query vectors. |
Author | Lertnattee, V. Theeramunkong, T. |
Author_xml | – sequence: 1 givenname: V. surname: Lertnattee fullname: Lertnattee, V. organization: Fac. of Pharmacy, Silpakorn Univ., Nakorn Pathom, Thailand – sequence: 2 givenname: T. surname: Theeramunkong fullname: Theeramunkong, T. |
BookMark | eNotj9tKw0AYhBdUUGteQG_2BRL3lD1cSlAbKCgY8bJsNv_WlXRTs02hb2-lGQbm4hsG5hZdxiECQveUFJQS81h_VHVTMELKgpZSGsMuUGaUJidzU3JNr1GW0g85iRuuJL1B7_V2Nw6HEDe4-bYBQzc5uw9DtD3-ghbv7Aaw621KwYczwVP6r4d4gDHNEPsRfieI7niHrrztE2RzLtDny3NTLfPV22tdPa3yQFW5z71wXBsLpW0ZeEoUo57o1mnJhe6o6pxkUkkhuCDQMqM7qZlgTDpipQbDF-jhvBsAYL0bw9aOx_X8m_8BnPtQMw |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ISCIT.2005.1566992 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Statistics |
EndPage | 820 |
ExternalDocumentID | 1566992 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AARBI ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK OCL RIE RIL |
ID | FETCH-LOGICAL-i175t-f4c389ae5ab2ef10721f08bc86348d17dc6267644340eb298d6824226c0a68e93 |
IEDL.DBID | RIE |
ISBN | 9780780395381 0780395387 |
IngestDate | Wed Jun 26 19:20:54 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i175t-f4c389ae5ab2ef10721f08bc86348d17dc6267644340eb298d6824226c0a68e93 |
PageCount | 4 |
ParticipantIDs | ieee_primary_1566992 |
PublicationCentury | 2000 |
PublicationDate | 20050000 |
PublicationDateYYYYMMDD | 2005-01-01 |
PublicationDate_xml | – year: 2005 text: 20050000 |
PublicationDecade | 2000 |
PublicationTitle | IEEE International Symposium on Communications and Information Technology, 2005. ISCIT 2005 |
PublicationTitleAbbrev | ISCIT |
PublicationYear | 2005 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0000393761 |
Score | 1.3654705 |
Snippet | Automatic text classification for a Web collection is a challenge task, especially in the case that the language is not English, such as Thai. However, most of... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 817 |
SubjectTerms | Bayesian methods Electronic mail Frequency Natural languages Prototypes Statistics Support vector machine classification Support vector machines Text categorization Web pages |
Title | Improving Thai educational Web page classification using inverse class frequency |
URI | https://ieeexplore.ieee.org/document/1566992 |
Volume | 2 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVop7IAbRHf8sBIaJo4sT2XVu2CKrUItsofZ6iEUtSPgX-Pz0kDSCxscU6yLCex313eeybkNhM85Ua5yOQyiZgTJlJK8ShX_vXRUirJUTs8nvHHF_EwRJucu1oLAwCBfAb3eBn-5duV2WGprIe5hpR-wW1wKUqtVl1PQY2pT8lDZi58w3_IvDLY2bf7e9FMLHuT2WAyL0sqVa-_jlcJu8vo6H_jOibdb5kendYb0Ak5gKJNDn84DLZJC8Fk6cXcIdO6gkDnb2pJYU_uUO_0GTTFtYUahNPIHwoRirz4V7oskL1RBalbl_zrzy55Gg3ng3FUnagQLT1M2EaOGQ9QFGRKJ-D66I3mYqGNyFMmbJ9b4_Mb7iFSymKfckthc5Gg1tbEKhcg01PSLFYFnBEquI_FTFumgaVgZSYzIT3-SvJUOa3PSQcnavFRmmYsqjm6-Pv2JWkFT9RQ27gize16B9eksbG7m_CYvwDmqqSU |
link.rule.ids | 310,311,782,786,791,792,798,4055,4056,27935,54769 |
linkProvider | IEEE |
linkToHtml | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwFH5RPIgXFTD-tgePTsbWre0ZIRCRkDCjN9J2nZKYYRAO_vf2dWNq4sXbupc0Tbe133v7vq8A1xFnIdMy83QsAo9mXHtSSubF0r4-SggpGGqHB1M2fuZ3PbTJuam0MMYYRz4zt3jp_uWnC73GUlkbcw0h7IK7E1EWi0KtVVVUUGVqk3KXm3PbsJ8yKy12Nu3ORjbji_Zw2h0mRVGl7PfXAStuf-nv_29kB9D6FuqRSbUFHcKWyRuw98NjsAF1hJOFG3MTJlUNgSSvck7Mht4h38iTUQRXF6IRUCODyEUIMuNfyDxH_kYZJNmyYGB_tuCx30u6A688U8GbW6Cw8jKqLUSRJpIqMFkH3dEynyvN45DytMNSbTMcZkFSSH2bdAuexjxAta32ZcyNCI-gli9ycwyEMxvzqUqpMjQ0qYhExIVFYEEcykypE2jiRM3eC9uMWTlHp3_fvoLdQfIwmo2G4_szqDuHVFfpOIfaark2F7D9ka4v3SP_AsiCp-c |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE+International+Symposium+on+Communications+and+Information+Technology%2C+2005.+ISCIT+2005&rft.atitle=Improving+Thai+educational+Web+page+classification+using+inverse+class+frequency&rft.au=Lertnattee%2C+V.&rft.au=Theeramunkong%2C+T.&rft.date=2005-01-01&rft.pub=IEEE&rft.isbn=9780780395381&rft.volume=2&rft.spage=817&rft.epage=820&rft_id=info:doi/10.1109%2FISCIT.2005.1566992&rft.externalDocID=1566992 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780780395381/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780780395381/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780780395381/sc.gif&client=summon&freeimage=true |