DoT-Net: Document Layout Classification Using Texture-Based CNN

Document Layout Analysis (DLA) is a segmentation process that decomposes a scanned document image into its blocks of interest and classifies them. DLA is essential in a large number of applications, such as Information Retrieval, Machine Translation, Optical Character Recognition (OCR) systems, and...

Full description

Saved in:

Bibliographic Details
Published in:	2019 International Conference on Document Analysis and Recognition (ICDAR) pp. 1029 - 1034
Main Authors:	Kosaraju, Sai Chandra, Masum, Mohammed, Tsaku, Nelson Zange, Patel, Pritesh, Bayramoglu, Tanju, Modgil, Girish, Kang, Mingon
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01-09-2019
Subjects:	dilated CNN document layout analysis Feature extraction Kernel Layout Optical character recognition software Shape Support vector machines Text analysis texture based document analysis
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Document Layout Analysis (DLA) is a segmentation process that decomposes a scanned document image into its blocks of interest and classifies them. DLA is essential in a large number of applications, such as Information Retrieval, Machine Translation, Optical Character Recognition (OCR) systems, and structured data extraction from documents. However, identification of document blocks in DLA is challenging due to variations of block locations, inter-and intra-class variability, and background noises. In this paper, we propose a novel texture-based convolutional neural network for document layout analysis, called DoT-Net. DoT-Net is a multiclass classifier that can effectively identify document component blocks such as text, image, table, mathematical expression, and line-diagram, whereas most related methods have focused on the text vs. non-text block classification problem. DoT-Net can capture textural variations among the multiclass regions of documents. Our proposed method DoT-Net achieved promising results outperforming state-of-the-art document layout classifiers on accuracy, F1 score, and AUC. The open-source code of DoT-Net is available at https://github.com/datax-lab/DoTNet.
ISSN:	2379-2140
DOI:	10.1109/ICDAR.2019.00168