Search Results - "Thibault Clérice"
-
1
You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine
Published in Journal of data mining and digital humanities (26-12-2023)“…Layout Analysis (the identification of zones and their classification) is the first step along line segmentation in Optical Character Recognition and similar…”
Get full text
Journal Article -
2
Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin
Published in Journal of data mining and digital humanities (07-04-2020)“…Tokenization of modern and old Western European languages seems to be fairly simple, as it stands on the presence mostly of markers such as spaces and…”
Get full text
Journal Article -
3
Artificial colorization of digitized microfilms: a preliminary study
Published in Journal of data mining and digital humanities (12-04-2023)“…A lot of available digitized manuscripts online are actually digitized microfilms, a technology dating back from the 1930s. With the progress of artificial…”
Get full text
Journal Article -
4
OCR17: Ground Truth and Models for 17th c. French Prints (and hopefully more)
Published in Journal of data mining and digital humanities (28-06-2023)“…Machine learning begins with machine teaching: in the following paper, we present the data that we have prepared to kick-start the training of reliable OCR…”
Get full text
Journal Article -
5
Distributed Text Services (DTS): A Community-Built API to Publish and Consume Text Collections as Linked Data
Published in Journal of the Text Encoding Initiative (13-01-2023)“…This paper presents the Distributed Text Service (DTS) API Specification, a community-built effort to facilitate the publication and consumption of texts and…”
Get full text
Journal Article -
6
Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre
Published in Journal of data mining and digital humanities (14-02-2021)“…This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly…”
Get full text
Journal Article -
7
ARletta. Open-Source Handwritten Text Recognition Models for Historic Dutch
Published in Journal of open humanities data (11-07-2024)“…We release ARletta, a series of open-source models for the automated transcription of historic Dutch-language handwritten sources, which has remained a…”
Get full text
Journal Article -
8
CREMMA Medii Aevi: Literary Manuscript Text Recognition in Latin
Published in Journal of open humanities data (12-04-2023)“…This paper presents a novel segmentation and handwritten text recognition dataset for Medieval Latin from the 11th to the 16th century. It connects with…”
Get full text
Journal Article -
9
Evaluating Deep Learning Methods for Word Segmentation of Scripta Continua Texts in Old French and Latin Évaluer les méthodes de deep learning pour la segmentation des mots de textes en scripta continua en ancien francais et en latin
Published in Journal of data mining and digital humanities (01-04-2020)“…International audience Tokenization of modern and old Western European languages seems to be fairly simple, as it stands on the presence mostly of markers such…”
Get full text
Journal Article -
10
Noisy medieval data, from digitized manuscript to stylometric analysis: Evaluating Paul Meyer’s hagiographic hypothesis
Published in Digital Scholarship in the Humanities (01-10-2021)“…Stylometric analysis of medieval vernacular texts is still a significant challenge: the importance of scribal variation, be it spelling or more substantial, as…”
Get full text
Journal Article -
11
Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre
Published 05-02-2021“…Journal of Data Mining & Digital Humanities, 2021, Digital humanities in languages (February 14, 2021) jdmdh:6485 This paper describes the process of building…”
Get full text
Journal Article -
12
Les outils CapiTainS, l’édition numérique et l’exploitation des textes
Published in Médiévales (15-12-2017)Get full text
Journal Article -
13
Standardizing linguistic data: method and tools for annotating (pre-orthographic) French
Published 22-11-2020“…Proceedings of the 2nd International Digital Tools & Uses Congress (DTUC '20), Oct 2020, Hammamet, Tunisia With the development of big corpora of various…”
Get full text
Journal Article -
14
Continuous Integration and Unit Testing of Digital Editions
Published in Digital humanities quarterly (01-01-2017)“…Over the last few years, the Perseus Digital Library (PDL) and the Open Philology Project (OPP) have been moving towards enabling better interoperability and…”
Get full text
Journal Article -
15
Continuous Integration and Unit Testing of Digital Editions
Published in Digital humanities quarterly (22-02-2018)“…Over the last few years, the Perseus Digital Library (PDL) and the Open Philology Project (OPP) have been moving towards enabling better interoperability and…”
Get full text
Journal Article -
16
Detecting Sexual Content at the Sentence Level in First Millennium Latin Texts
Published 25-09-2023“…Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ELRA Language Resources Association (ELRA);…”
Get full text
Journal Article -
17
CapiTainS Toolkit, Digital Editing and Data Reuse
Published in Médiévales (15-12-2017)“…Les outils CapiTainS, l'édition numérique et l'exploitation des textes CapiTainS toolkit, digital editing and data reuse Résumé L'édition numérique a pris une…”
Get full text
Journal Article -
18
You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine
Published 19-07-2022“…Layout Analysis (the identification of zones and their classification) is the first step along line segmentation in Optical Character Recognition and similar…”
Get full text
Journal Article -
19
Les outils CapiTainS, l’édition numérique et l’exploitation des textes
Published in Médiévales (2017)“…Issu de la collaboration de membres des équipes de Perseids et Perseus à Tufts et de la Humboldt Chair of Digital Humanities de l’Université de Leipzig, le…”
Get full text
Journal Article -
20
Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre
Published in Journal of data mining and digital humanities (01-02-2021)“…This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly…”
Get full text
Journal Article