Comparison of collocation extraction measures for document indexing

Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundan...

Full description

Saved in:
Bibliographic Details
Published in:28th International Conference on Information Technology Interfaces, 2006 pp. 451 - 456
Main Authors: Petrovic, S., Snajder, J., Dalbelo-Basic, B., Kolar, M.
Format: Conference Proceeding
Language:English
Published: IEEE 2006
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words
AbstractList Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by employing some kind of a statistical measure that indicates whether or not two words occur together more often than by chance. As there is an abundance of these measures proposed by various authors, we have compared some of them on a task of extracting collocations from a corpus of Croatian legal documents for the purpose of document indexing. We propose and evaluate extensions of these measures for collocations consisting of three words
Author Petrovic, S.
Snajder, J.
Dalbelo-Basic, B.
Kolar, M.
Author_xml – sequence: 1
  givenname: S.
  surname: Petrovic
  fullname: Petrovic, S.
  organization: Fac. of Electr. Eng. & Comput., Zagreb Univ
– sequence: 2
  givenname: J.
  surname: Snajder
  fullname: Snajder, J.
  organization: Fac. of Electr. Eng. & Comput., Zagreb Univ
– sequence: 3
  givenname: B.
  surname: Dalbelo-Basic
  fullname: Dalbelo-Basic, B.
  organization: Fac. of Electr. Eng. & Comput., Zagreb Univ
– sequence: 4
  givenname: M.
  surname: Kolar
  fullname: Kolar, M.
  organization: Fac. of Electr. Eng. & Comput., Zagreb Univ
BookMark eNotj01rwzAQRAVNoWnqe6EX_wGnu1rbko7F9MMQ6CU9B1leFYEtBduB9N83tDnNvMvw5l6sYoosxCPCFhHMc7tvtxKg3qICXUm6EZlR2lSkkDRUZiXWSAQFAso7kc1z6AAkaEMG16Jp0ni0U5hTzJPPXRqG5OwSLsjnZbLur45s59PEc-7TlPfJnUaOSx5iz-cQvx_ErbfDzNk1N-Lr7XXffBS7z_e2edkVQaJaCnmxqyVZ7amrgUk5aUzJ4LDvFNtKsTYeyXt03nmwfV0pWYI0GkpDfU0b8fS_G5j5cJzCaKefw_U2_QLGdk1i
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ITI.2006.1708523
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: http://ieeexplore.ieee.org/Xplore/DynWel.jsp
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Statistics
Law
EndPage 456
ExternalDocumentID 1708523
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AARBI
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
OCL
RIE
RIL
ID FETCH-LOGICAL-i217t-2852623a8f3b60e37c2994e0c1db7ea57e89f13ff1cfcf0ad6572402980493d63
IEDL.DBID RIE
ISBN 9789537138059
9537138054
ISSN 1330-1012
IngestDate Wed Jun 26 19:22:42 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i217t-2852623a8f3b60e37c2994e0c1db7ea57e89f13ff1cfcf0ad6572402980493d63
OpenAccessLink http://hrcak.srce.hr/file/69275
PageCount 6
ParticipantIDs ieee_primary_1708523
PublicationCentury 2000
PublicationDate 20060000
PublicationDateYYYYMMDD 2006-01-01
PublicationDate_xml – year: 2006
  text: 20060000
PublicationDecade 2000
PublicationTitle 28th International Conference on Information Technology Interfaces, 2006
PublicationTitleAbbrev ITI
PublicationYear 2006
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib002089391
ssib003012261
ssib000986468
ssib005283003
ssj0000395209
ssib001418696
ssib002806550
Score 1.6769423
Snippet Automatic extraction of collocations from a corpus is a well-known problem in the field of natural language processing. It is typically carried out by...
SourceID ieee
SourceType Publisher
StartPage 451
SubjectTerms Cancer
Computational linguistics
Data mining
Guns
Indexing
Law
Legal factors
Natural language processing
Statistics
Stock markets
Title Comparison of collocation extraction measures for document indexing
URI https://ieeexplore.ieee.org/document/1708523
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ09T8MwEIZPpFNZgLaIb3lgJNSuk9ieS6tWQgiJIrFVjmNLDDSotOLvc47TNEgsbEmG6GRZvufufO8B3MqcF8ghOpZF5rNVGY11zlxsrabGcpXoambk7EU8vcmHiZfJuWt6Yay11eUze-8fq1p-UZqtT5UNmUBAGPEIIqFk6NVqoYLMkhY6s8QPW2qhM0XPrNo9m-h80zY6MySRllSVV8aqK5TVqc5VfWMEozoae1ksDPJUyjHKk8g9QcyneVe7kihVw_liHqoetf2_BrlUfmx69L8VOIbBviGQPDeu7gQO7KoHhy0twx5Ej_q7B10Pr0H7uQ_jcTPkkJSO-C1XhgwhQZewDi0V5CNkKb8I4jPZmUAqFUf88QBep5PFeBbXQxvid4xuNvEI7UOk0tLxPKOWC4MOL7HUMC_krFNhpXKMO8eMM47qIkuFr_AoibEKLzJ-Cp1VubJnQKRBPOXMGi2yBM_kXFrkEaVFahKmKT-Hvl-h5WfQ5VjWi3Px9-dL6O7TJ1fQ2ay39hqir2J7U-2kHxrFt2c
link.rule.ids 310,311,782,786,791,792,798,4054,4055,27934,54767
linkProvider IEEE
linkToHtml http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ09T8MwEIZPtAyUBWiL-MYDI6F2ncT2XFq1olRIFImtchxHYqBF_RB_n3OcpkFiYUsyWI5l-Z67870HcCcTniKH6ECmsYtWxTTQCcsCazU1lqtQ5z0jh69i8i4f-04m576shbHW5pfP7IN7zHP56cJsXKiswwQCQpfXYD8KRSx8tVYFFmQcVuCZha7dUgWeKdpmVa3aRPMbVeGZIYtUxKqcNlaRo8zPda6KOyPo19HACWOhm6cijn6eRPLxcj7lu9omRanqjKYjn_co_uBXK5fckg2O_rcGx9DelQSSl9LYncCenTfhsKJm2ITaWH83oeHw1as_t6DXK9sckkVG3KZb-BghQaOw9EUV5NPHKVcEAZpsp0ByHUccuA1vg_60NwyKtg3BB_o366CL80Oo0jLjSUwtFwZNXmipYU7KWUfCSpUxnmXMZCajOo0j4XI8SqK3wtOYn0J9vpjbMyDSIKByZo0WcYinciItEonSIjIh05SfQ8ut0OzLK3PMisW5-PvzLRwMp8_j2Xg0ebqExi6YcgX19XJjr6G2Sjc3-a76AQIfurg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=28th+International+Conference+on+Information+Technology+Interfaces%2C+2006&rft.atitle=Comparison+of+collocation+extraction+measures+for+document+indexing&rft.au=Petrovic%2C+S.&rft.au=Snajder%2C+J.&rft.au=Dalbelo-Basic%2C+B.&rft.au=Kolar%2C+M.&rft.date=2006-01-01&rft.pub=IEEE&rft.isbn=9789537138059&rft.issn=1330-1012&rft.spage=451&rft.epage=456&rft_id=info:doi/10.1109%2FITI.2006.1708523&rft.externalDocID=1708523
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1330-1012&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1330-1012&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1330-1012&client=summon