MSdB-NMF: MultiSpectral Document Image Binarization Framework via Non-Negative Matrix Factorization Approach

In this paper, we propose a novel method for Multispectral document image binarization (MSdB) through the Non-negative Matrix Factorization (NMF) approach. We propose a three-step MSdB-NMF framework: i) NMF-based feature extraction algorithm by introducing a new optimization problem; ii) post-proces...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on image processing Vol. 29; pp. 9099 - 9112
Main Authors: Salehani, Yaser Esmaeili, Arabnejad, Ehsan, Rahiche, Abderrahmane, Bakhta, Athmane, Cheriet, Mohamed
Format: Journal Article
Language:English
Published: United States IEEE 01-01-2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we propose a novel method for Multispectral document image binarization (MSdB) through the Non-negative Matrix Factorization (NMF) approach. We propose a three-step MSdB-NMF framework: i) NMF-based feature extraction algorithm by introducing a new optimization problem; ii) post-processing method iii); apply any existing gray/RGB binarization scheme. In the first step, we extract <inline-formula> <tex-math notation="LaTeX">N </tex-math></inline-formula> features out of <inline-formula> <tex-math notation="LaTeX">B </tex-math></inline-formula> spectral bands (<inline-formula> <tex-math notation="LaTeX">N < B </tex-math></inline-formula>) and their corresponding coefficient matrix. We introduce a novel objective formulation that considers the robustness (related to the noise and various types of degradations) and sparseness (related to the ratio of text pixels versus the background). We employ the multiplicative updating rules to solve the proposed minimization problem and prove the convergence of the proposed feature extraction algorithm. In the next step, we select an appropriate feature vector, equivalently the corresponding coefficient vector. We propose to select it either visually or automatically via a post-processing method, which uses the benchmark binarization methods as baseline. In the last step, we apply some existing binarization methods such as Sauvola and Howe over the selected coefficient vector. Our proposed binarization framework is applicable for any kind of MS or hyperspectral (HS) document image without considering any prior knowledge such as the side information about the spectral bands of MS/HS document image. We evaluate our proposed binarization framework over two MS document image datasets. The experimental results confirm that our proposed framework outperforms several state-of-the-art binarization schemes including the winner of the contest in MS-TEx-2015 .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1057-7149
1941-0042
DOI:10.1109/TIP.2020.3023613