Comparison of collocation extraction measures for document indexing
Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundan...
Saved in:
Published in: | 28th International Conference on Information Technology Interfaces, 2006 pp. 451 - 456 |
---|---|
Main Authors: | , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
2006
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words |
---|---|
AbstractList | Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words |
Author | Petrovic, S. Snajder, J. Dalbelo-Basic, B. Kolar, M. |
Author_xml | – sequence: 1 givenname: S. surname: Petrovic fullname: Petrovic, S. organization: Fac. of Electr. Eng. & Comput., Zagreb Univ – sequence: 2 givenname: J. surname: Snajder fullname: Snajder, J. organization: Fac. of Electr. Eng. & Comput., Zagreb Univ – sequence: 3 givenname: B. surname: Dalbelo-Basic fullname: Dalbelo-Basic, B. organization: Fac. of Electr. Eng. & Comput., Zagreb Univ – sequence: 4 givenname: M. surname: Kolar fullname: Kolar, M. organization: Fac. of Electr. Eng. & Comput., Zagreb Univ |
BookMark | eNotj01rwzAQRAVNoWnqe6EX_wGnu1rbko7F9MMQ6CU9B1leFYEtBduB9N83tDnNvMvw5l6sYoosxCPCFhHMc7tvtxKg3qICXUm6EZlR2lSkkDRUZiXWSAQFAso7kc1z6AAkaEMG16Jp0ni0U5hTzJPPXRqG5OwSLsjnZbLur45s59PEc-7TlPfJnUaOSx5iz-cQvx_ErbfDzNk1N-Lr7XXffBS7z_e2edkVQaJaCnmxqyVZ7amrgUk5aUzJ4LDvFNtKsTYeyXt03nmwfV0pWYI0GkpDfU0b8fS_G5j5cJzCaKefw_U2_QLGdk1i |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ITI.2006.1708523 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Statistics Law |
EndPage | 456 |
ExternalDocumentID | 1708523 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AARBI ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK OCL RIE RIL |
ID | FETCH-LOGICAL-i217t-2852623a8f3b60e37c2994e0c1db7ea57e89f13ff1cfcf0ad6572402980493d63 |
IEDL.DBID | RIE |
ISBN | 9789537138059 9537138054 |
ISSN | 1330-1012 |
IngestDate | Wed Jun 26 19:22:42 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i217t-2852623a8f3b60e37c2994e0c1db7ea57e89f13ff1cfcf0ad6572402980493d63 |
OpenAccessLink | http://hrcak.srce.hr/file/69275 |
PageCount | 6 |
ParticipantIDs | ieee_primary_1708523 |
PublicationCentury | 2000 |
PublicationDate | 20060000 |
PublicationDateYYYYMMDD | 2006-01-01 |
PublicationDate_xml | – year: 2006 text: 20060000 |
PublicationDecade | 2000 |
PublicationTitle | 28th International Conference on Information Technology Interfaces, 2006 |
PublicationTitleAbbrev | ITI |
PublicationYear | 2006 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssib002089391 ssib003012261 ssib000986468 ssib005283003 ssj0000395209 ssib001418696 ssib002806550 |
Score | 1.6769423 |
Snippet | Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 451 |
SubjectTerms | Cancer Computational linguistics Data mining Guns Indexing Law Legal factors Natural language processing Statistics Stock markets |
Title | Comparison of collocation extraction measures for document indexing |
URI | https://ieeexplore.ieee.org/document/1708523 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ09T8MwEIZPpFNZgLaIb3lgJNSuk9ieS6tWQgiJIrFVjmNLDDSotOLvc47TNEgsbEmG6GRZvufufO8B3MqcF8ghOpZF5rNVGY11zlxsrabGcpXoambk7EU8vcmHiZfJuWt6Yay11eUze-8fq1p-UZqtT5UNmUBAGPEIIqFk6NVqoYLMkhY6s8QPW2qhM0XPrNo9m-h80zY6MySRllSVV8aqK5TVqc5VfWMEozoae1ksDPJUyjHKk8g9QcyneVe7kihVw_liHqoetf2_BrlUfmx69L8VOIbBviGQPDeu7gQO7KoHhy0twx5Ej_q7B10Pr0H7uQ_jcTPkkJSO-C1XhgwhQZewDi0V5CNkKb8I4jPZmUAqFUf88QBep5PFeBbXQxvid4xuNvEI7UOk0tLxPKOWC4MOL7HUMC_krFNhpXKMO8eMM47qIkuFr_AoibEKLzJ-Cp1VubJnQKRBPOXMGi2yBM_kXFrkEaVFahKmKT-Hvl-h5WfQ5VjWi3Px9-dL6O7TJ1fQ2ay39hqir2J7U-2kHxrFt2c |
link.rule.ids | 310,311,782,786,791,792,798,4054,4055,27934,54767 |
linkProvider | IEEE |
linkToHtml | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ09T8MwEIZPtAyUBWiL-MYDI6F2ncT2XFq1olRIFImtchxHYqBF_RB_n3OcpkFiYUsyWI5l-Z67870HcCcTniKH6ECmsYtWxTTQCcsCazU1lqtQ5z0jh69i8i4f-04m576shbHW5pfP7IN7zHP56cJsXKiswwQCQpfXYD8KRSx8tVYFFmQcVuCZha7dUgWeKdpmVa3aRPMbVeGZIYtUxKqcNlaRo8zPda6KOyPo19HACWOhm6cijn6eRPLxcj7lu9omRanqjKYjn_co_uBXK5fckg2O_rcGx9DelQSSl9LYncCenTfhsKJm2ITaWH83oeHw1as_t6DXK9sckkVG3KZb-BghQaOw9EUV5NPHKVcEAZpsp0ByHUccuA1vg_60NwyKtg3BB_o366CL80Oo0jLjSUwtFwZNXmipYU7KWUfCSpUxnmXMZCajOo0j4XI8SqK3wtOYn0J9vpjbMyDSIKByZo0WcYinciItEonSIjIh05SfQ8ut0OzLK3PMisW5-PvzLRwMp8_j2Xg0ebqExi6YcgX19XJjr6G2Sjc3-a76AQIfurg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=28th+International+Conference+on+Information+Technology+Interfaces%2C+2006&rft.atitle=Comparison+of+collocation+extraction+measures+for+document+indexing&rft.au=Petrovic%2C+S.&rft.au=Snajder%2C+J.&rft.au=Dalbelo-Basic%2C+B.&rft.au=Kolar%2C+M.&rft.date=2006-01-01&rft.pub=IEEE&rft.isbn=9789537138059&rft.issn=1330-1012&rft.spage=451&rft.epage=456&rft_id=info:doi/10.1109%2FITI.2006.1708523&rft.externalDocID=1708523 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1330-1012&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1330-1012&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1330-1012&client=summon |