Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre
This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The...
Saved in:
Published in: | Journal of data mining and digital humanities Vol. 2021; no. Digital humanities in... |
---|---|
Main Authors: | , , , , |
Format: | Journal Article |
Language: | English |
Published: |
INRIA
14-02-2021
Nicolas Turenne |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper describes the process of building an annotated corpus and training
models for classical French literature, with a focus on theatre, and
particularly comedies in verse. It was originally developed as a preliminary
step to the stylometric analyses presented in Cafiero and Camps [2019]. The use
of a recent lemmatiser based on neural networks and a CRF tagger allows to
achieve accuracies beyond the current state-of-the art on the in-domain test,
and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels. |
---|---|
ISSN: | 2416-5999 2416-5999 |
DOI: | 10.46298/jdmdh.6485 |