Urdu Dependency Parsing and Treebank Development: A Syntactic and Morphological Perspective

Parsing is the process of analyzing a sentence's syntactic structure by breaking it down into its grammatical components. and is critical for various linguistic applications. Urdu is a low-resource, free word-order language and exhibits complex morphology. Literature suggests that dependency pa...

Full description

Saved in:

Bibliographic Details
Main Author:	Habib, Nudrat
Format:	Journal Article
Language:	English
Published:	13-06-2024
Subjects:	Computer Science - Computation and Language Computer Science - Learning
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Parsing is the process of analyzing a sentence's syntactic structure by breaking it down into its grammatical components. and is critical for various linguistic applications. Urdu is a low-resource, free word-order language and exhibits complex morphology. Literature suggests that dependency parsing is well-suited for such languages. Our approach begins with a basic feature model encompassing word location, head word identification, and dependency relations, followed by a more advanced model integrating part-of-speech (POS) tags and morphological attributes (e.g., suffixes, gender). We manually annotated a corpus of news articles of varying complexity. Using Maltparser and the NivreEager algorithm, we achieved a best-labeled accuracy (LA) of 70% and an unlabeled attachment score (UAS) of 84%, demonstrating the feasibility of dependency parsing for Urdu.
DOI:	10.48550/arxiv.2406.09549