Urdu Dependency Parsing and Treebank Development: A Syntactic and Morphological Perspective
Parsing is the process of analyzing a sentence's syntactic structure by breaking it down into its grammatical components. and is critical for various linguistic applications. Urdu is a low-resource, free word-order language and exhibits complex morphology. Literature suggests that dependency pa...
Saved in:
Main Author: | |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
13-06-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Parsing is the process of analyzing a sentence's syntactic structure by
breaking it down into its grammatical components. and is critical for various
linguistic applications. Urdu is a low-resource, free word-order language and
exhibits complex morphology. Literature suggests that dependency parsing is
well-suited for such languages. Our approach begins with a basic feature model
encompassing word location, head word identification, and dependency relations,
followed by a more advanced model integrating part-of-speech (POS) tags and
morphological attributes (e.g., suffixes, gender). We manually annotated a
corpus of news articles of varying complexity. Using Maltparser and the
NivreEager algorithm, we achieved a best-labeled accuracy (LA) of 70% and an
unlabeled attachment score (UAS) of 84%, demonstrating the feasibility of
dependency parsing for Urdu. |
---|---|
DOI: | 10.48550/arxiv.2406.09549 |