Convolutional neural networks for classification of Alzheimer's disease: Overview and reproducible evaluation

•We reviewed the state-of-the-art on classification of AD based on CNN and T1 MRI.•We unveiled data leakage, leading to biased results, in some reviewed studies.•We proposed a framework for reproducible evaluation of AD classification methods.•We demonstrated the use of the proposed framework on thr...

Full description

Saved in:
Bibliographic Details
Published in:Medical image analysis Vol. 63; p. 101694
Main Authors: Wen, Junhao, Thibeau-Sutre, Elina, Diaz-Melo, Mauricio, Samper-González, Jorge, Routier, Alexandre, Bottani, Simona, Dormont, Didier, Durrleman, Stanley, Burgos, Ninon, Colliot, Olivier
Format: Journal Article
Language:English
Published: Netherlands Elsevier B.V 01-07-2020
Elsevier BV
Elsevier
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•We reviewed the state-of-the-art on classification of AD based on CNN and T1 MRI.•We unveiled data leakage, leading to biased results, in some reviewed studies.•We proposed a framework for reproducible evaluation of AD classification methods.•We demonstrated the use of the proposed framework on three public datasets.•We assessed generalizability both within a dataset and between datasets. Numerous machine learning (ML) approaches have been proposed for automatic classification of Alzheimer's disease (AD) from brain imaging data. In particular, over 30 papers have proposed to use convolutional neural networks (CNN) for AD classification from anatomical MRI. However, the classification performance is difficult to compare across studies due to variations in components such as participant selection, image preprocessing or validation procedure. Moreover, these studies are hardly reproducible because their frameworks are not publicly accessible and because implementation details are lacking. Lastly, some of these papers may report a biased performance due to inadequate or unclear validation or model selection procedures. In the present work, we aim to address these limitations through three main contributions. First, we performed a systematic literature review. We identified four main types of approaches: i) 2D slice-level, ii) 3D patch-level, iii) ROI-based and iv) 3D subject-level CNN. Moreover, we found that more than half of the surveyed papers may have suffered from data leakage and thus reported biased performance. Our second contribution is the extension of our open-source framework for classification of AD using CNN and T1-weighted MRI. The framework comprises previously developed tools to automatically convert ADNI, AIBL and OASIS data into the BIDS standard, and a modular set of image preprocessing procedures, classification architectures and evaluation procedures dedicated to deep learning. Finally, we used this framework to rigorously compare different CNN architectures. The data was split into training/validation/test sets at the very beginning and only the training/validation sets were used for model selection. To avoid any overfitting, the test sets were left untouched until the end of the peer-review process. Overall, the different 3D approaches (3D-subject, 3D-ROI, 3D-patch) achieved similar performances while that of the 2D slice approach was lower. Of note, the different CNN approaches did not perform better than a SVM with voxel-based features. The different approaches generalized well to similar populations but not to datasets with different inclusion criteria or demographical characteristics. All the code of the framework and the experiments is publicly available: general-purpose tools have been integrated into the Clinica software (www.clinica.run) and the paper-specific code is available at: https://github.com/aramis-lab/AD-DL. [Display omitted]
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Undefined-1
ObjectType-Feature-3
content type line 23
ISSN:1361-8415
1361-8423
DOI:10.1016/j.media.2020.101694