Extraction of Literary Character Information in Portuguese

This chapter describes PALAVRAS-DIP, a system for the automatic identification of characters and their social profiles in Portuguese and Brazilian literature. The system has been designed as an add-on module for a morphosyntactic and semantic parser. We tag human named entities (NE) for profession a...

Full description

Saved in:

Bibliographic Details
Published in:	Linguamática (Braga, Portugal) Vol. 15; no. 1; pp. 31 - 40
Main Author:	Bick, Eckhard
Format:	Journal Article
Language:	English
Published:	Universidade do Minho & Universidade de Vigo 01-01-2023
Subjects:	anaphora resolution constraint grammar distant reading ner
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This chapter describes PALAVRAS-DIP, a system for the automatic identification of characters and their social profiles in Portuguese and Brazilian literature. The system has been designed as an add-on module for a morphosyntactic and semantic parser. We tag human named entities (NE) for profession and social position, and use Constraint Grammar (CG relational tags to keep track of co-reference (e.g. pronoun anaphora, zero-subject verbs) and family reations between the characters. The resulting base annotation allows the extraction of character networks. The extraction program recognizes and bundles character name variants and distinguishes between names with a narrative function and simple cultural references. System development was motivated by DIP, a shared-task evaluation on 100 historical novels, where a prototype version achieved reasonable F-scores for character identification (63.4%) and alias resolution (68.1%), but underperformed for family relations (15.5%).
ISSN:	1647-0818 1647-0818
DOI:	10.21814/lm.15.1.397