Unknown words analysis in POS tagging of Sinhala language

Appearance of unknown words is one of the frequently occurring problems facing in part of speech (POS) tagging process, i.e., the words that appear in sentences, but are not contained within the training corpus. New words are continually coined to the language, and people will often use words that a...

Full description

Saved in:
Bibliographic Details
Published in:2014 14th International Conference on Advances in ICT for Emerging Regions (ICTer) p. 270
Main Authors: Jayaweera, A. J. P. M. P., Dias, N. G. J.
Format: Conference Proceeding
Language:English
Published: IEEE 01-12-2014
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Appearance of unknown words is one of the frequently occurring problems facing in part of speech (POS) tagging process, i.e., the words that appear in sentences, but are not contained within the training corpus. New words are continually coined to the language, and people will often use words that are parsing, the system may not expect. This problem get worse when NLP systems are used for more and more on-line computer applications. New words are continually entering the language, Acronyms and proper names are created very often and new nouns and verbs are adding to the language in a surprising rate. So it is impossible to train the tagger for every possible word in the language. So unknown words are non-negligible in POS tagging. Therefore, in order to build a complete tagger, tagger must be incurred with some knowledge of suggesting the tag for an unknown word.
ISBN:9781479977314
1479977314
DOI:10.1109/ICTER.2014.7083928