HyperMM : Robust Multimodal Learning with Varying-sized Inputs
Combining multiple modalities carrying complementary information through multimodal learning (MML) has shown considerable benefits for diagnosing multiple pathologies. However, the robustness of multimodal models to missing modalities is often overlooked. Most works assume modality completeness in t...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
30-07-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Combining multiple modalities carrying complementary information through
multimodal learning (MML) has shown considerable benefits for diagnosing
multiple pathologies. However, the robustness of multimodal models to missing
modalities is often overlooked. Most works assume modality completeness in the
input data, while in clinical practice, it is common to have incomplete
modalities. Existing solutions that address this issue rely on modality
imputation strategies before using supervised learning models. These
strategies, however, are complex, computationally costly and can strongly
impact subsequent prediction models. Hence, they should be used with parsimony
in sensitive applications such as healthcare. We propose HyperMM, an end-to-end
framework designed for learning with varying-sized inputs. Specifically, we
focus on the task of supervised MML with missing imaging modalities without
using imputation before training. We introduce a novel strategy for training a
universal feature extractor using a conditional hypernetwork, and propose a
permutation-invariant neural network that can handle inputs of varying
dimensions to process the extracted features, in a two-phase task-agnostic
framework. We experimentally demonstrate the advantages of our method in two
tasks: Alzheimer's disease detection and breast cancer classification. We
demonstrate that our strategy is robust to high rates of missing data and that
its flexibility allows it to handle varying-sized datasets beyond the scenario
of missing modalities. |
---|---|
DOI: | 10.48550/arxiv.2407.20768 |