Zero-Shot Learning Based Approach For Medieval Word Recognition using Deep-Learned Features

Historical manuscripts reflect our past. Recently digitization of large quantities of historical handwritten documents is taking place in every corner of the world, and are being archived. From those digital repositories, automatic text indexing and retrieval system fetch only those documents to an...

Full description

Saved in:
Bibliographic Details
Published in:2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) pp. 345 - 350
Main Authors: Chanda, Sukalpa, Baas, Jochem, Haitink, Daniel, Hamel, Sebastien, Stutzmann, Dominique, Schomaker, Lambert
Format: Conference Proceeding
Language:English
Published: IEEE 01-08-2018
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Historical manuscripts reflect our past. Recently digitization of large quantities of historical handwritten documents is taking place in every corner of the world, and are being archived. From those digital repositories, automatic text indexing and retrieval system fetch only those documents to an end user that they are interested in. A regular OCR technology is not capable of rendering this service to an end user in a reliable manner. Instead, a word recognition/spotting algorithm performs the task. Word recognition based systems require enough labelled data per class to train the system. Moreover, all word classes need to be taught beforehand. Though word spotting could evade this drawback of prior training, these systems often need to have additional overheads like a language model to deal with "out of lexicon" words. Zero-shot learning could be a possible alternative to counter such situation. A Zero-shot learning algorithm is capable of handling unseen classes, provided the algorithm has been fortified with rich discriminating features and reliable "attribute description" per class during training. Since deeply learned features have enough discriminating power, a deep learning framework has been used here for feature extraction purpose. To the best of our knowledge, this is probably the first work on "out of lexicon" medieval word recognition using a Zero-Shot Learning framework. We obtained very encouraging results(accuracy ≈57% for "out of lexicon" classes) while dealing with 166 training classes and 50 unseen test classes.
DOI:10.1109/ICFHR-2018.2018.00067