Enhancing Turkish Coreference Resolution: Insights from deep learning, dropped pronouns, and multilingual transfer learning

Coreference resolution (CR), which is the identification of in-text mentions that refer to the same entity, is a crucial step in natural language understanding. While CR in English has been studied for quite a long time, studies for pro-dropped and morphologically rich languages is an active researc...

Full description

Saved in:
Bibliographic Details
Published in:Computer speech & language Vol. 89; p. 101681
Main Authors: Pamay Arslan, Tuğba, Eryiğit, Gülşen
Format: Journal Article
Language:English
Published: Elsevier Ltd 01-01-2025
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Coreference resolution (CR), which is the identification of in-text mentions that refer to the same entity, is a crucial step in natural language understanding. While CR in English has been studied for quite a long time, studies for pro-dropped and morphologically rich languages is an active research area which has yet to reach sufficient maturity. Turkish, a morphologically highly-rich language, poses interesting challenges for natural language processing tasks, including CR, due to its agglutinative nature and consequent pronoun-dropping phenomenon. This article explores the use of different neural CR architectures (i.e., mention-pair, mention-ranking, and end-to-end) on Turkish, a morphologically highly-rich language, by formulating multiple research questions around the impacts of dropped pronouns, data quality, and interlingual transfer. The preparations made to explore these research questions and the findings obtained as a result of our explorations revealed the first Turkish CR dataset that includes dropped pronoun annotations (of size 4K entities/22K mentions), new state-of-the-art results on Turkish CR, the first neural end-to-end Turkish CR results (70.4% F-score), the first multilingual end-to-end CR results including Turkish (yielding 1.0 percentage points improvement on Turkish) and the demonstration of the positive impact of dropped pronouns on CR of pro-dropped and morphologically rich languages, for the first time in the literature. Our research has brought Turkish end-to-end CR performances (72.0% F-score) to similar levels with other languages, surpassing the baseline scores by 32.1 percentage points. •Present state-of-the-art results for Turkish coreference resolution.•Introduce a novel Turkish dataset compliant with CorefUD standards.•Explore primitive and end-to-end neural coreference architectures on Turkish.•Show the positive impact of interlingual transfer in Turkish-included scenarios.•Demonstrate the importance of dropped pronouns for coreference resolution of pro-dropped and morphologically rich languages.
ISSN:0885-2308
DOI:10.1016/j.csl.2024.101681