Multi-task disagreement-reducing multimodal sentiment fusion network

Existing multimodal sentiment analysis models can effectively capture sentimental commonalities between different modalities and possess high sentimental acquisition capability. However, there are still shortcomings in the model's analysis and recognition abilities when dealing with samples tha...

Full description

Saved in:

Bibliographic Details
Published in:	Image and vision computing Vol. 149; p. 105158
Main Authors:	Zijun, Wang, Naicheng, Jiang, Xinyue, Chao, Bin, Sun
Format:	Journal Article
Language:	English
Published:	Elsevier B.V 01-09-2024
Subjects:	Multi-task learning Multimodal fusion Multimodal sentiment analysis Sentiment disagreement Sentiment disagreement Multimodal sentiment analysis Multi-task learning Multimodal fusion
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Existing multimodal sentiment analysis models can effectively capture sentimental commonalities between different modalities and possess high sentimental acquisition capability. However, there are still shortcomings in the model's analysis and recognition abilities when dealing with samples that exhibit sentimental polarity disagreement between different modalities. Additionally, the dominance of the text modality in multimodal models, particularly those pre-trained with BERT, can hinder the learning of other modalities due to its richer semantic information. This issue becomes particularly pronounced in cases where there is a conflict between multimodal and textual sentimental polarities, often leading to suboptimal analytical results. Besides, the classification ability of each modality is also suppressed by single-task learning. In this paper, We propose a Multi-Task disagreement-Reducing Multimodal Sentiment Fusion Network (MtDr-MSF), designed to enhance the semantic information of non-text modalities and reduce the dominant impact of the textual modality on the model, and to improve the learning capabilities of unimodal networks. We conducted experiments on multimodal sentiment analysis datasets, CMU-MOSI, CMU-MOSEI, and CH-SIMS. The results show that our method outperforms the current SOTA method. •We can summarize the main contributions of our paper as follow:•Revealing the presence of special samples with contradictory sentiment polarities•Considering the potential issue that may arise due to the dominance of the textual modality.•Proposing a network to reduce the dominance of text and enhance the expression ability of unimodalities.•Testing our model on three datasets, CMU-MOSI, CMU-MOSEI, and CH-SIMS, and it achieves SOTA results.
ISSN:	0262-8856
DOI:	10.1016/j.imavis.2024.105158