Automatic Arabic term extraction from special domain corpora
The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extr...
Saved in:
Published in: | 2014 International Conference on Asian Language Processing (IALP) pp. 1 - 5 |
---|---|
Main Authors: | , , , |
Format: | Conference Proceeding |
Language: | English |
Published: |
IEEE
01-10-2014
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary building, terminology resource creation, and special domain ontology construction. Our literature survey shows a lack of such studies for Arabic special domain text; moreover, the few studies that have been identified use complex and computationally expensive methods. In this study, we use two basic methods to automatically extract terms from Arabic special domain corpora. Our methods are based on two simple heuristics. The most frequent words and n-grams in special domain corpora are typically terms, which themselves are typically bounded by functional words. We applied our methods on a corpus of applied Arabic linguistics. We obtained results comparable to those of other Arabic term extraction studies in that they exhibited 87% accuracy when only terms strictly pertaining to the field of applied Arabic linguistics were considered, and 93.7% when related terms were included. |
---|---|
DOI: | 10.1109/IALP.2014.6973468 |