On Identification and Retrieval of Near-Duplicate Biological Images: a New Dataset and Protocol

Manipulation and re-use of images in scientific publications is a growing issue, not only for biomedical publishers, but also for the research community in general. In this work we introduce BINDER - Bio-Image Near-Duplicate Examples Repository, a novel dataset to help researchers develop, train, an...

Full description

Saved in:

Bibliographic Details
Published in:	2020 25th International Conference on Pattern Recognition (ICPR) pp. 3114 - 3121
Main Authors:	Koker, T.E., Chintapalli, S.S., Wang, S., Talbot, B.A., Wainstock, D., Cicconet, M., Walsh, M.C.
Format:	Conference Proceeding
Language:	English
Published:	IEEE 10-01-2021
Subjects:	Adaptation models Biological system modeling Image retrieval Measurement Protocols Training Transforms
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Manipulation and re-use of images in scientific publications is a growing issue, not only for biomedical publishers, but also for the research community in general. In this work we introduce BINDER - Bio-Image Near-Duplicate Examples Repository, a novel dataset to help researchers develop, train, and test models to detect same-source biomedical images. BINDER contains 7,490 unique image patches for model training, 1,821 same-size patch duplicates for validation and testing, and 868 different-size image/patch pairs for image retrieval validation and testing. Except for the training set, patches already contain manipulations including rotation, translation, scale, perspective transform, contrast adjustment and/or compression artifacts. We further use the dataset to demonstrate how novel adaptations of existing image retrieval and metric learning models can be applied to achieve high-accuracy inference results, creating a baseline for future work. In aggregate, we thus present a supervised protocol for near-duplicate image identification and retrieval without any "real-world" training example. Our dataset and source code are available at hms-idac.github.io/BINDER.
DOI:	10.1109/ICPR48806.2021.9412849