Multi-task disagreement-reducing multimodal sentiment fusion network
Existing multimodal sentiment analysis models can effectively capture sentimental commonalities between different modalities and possess high sentimental acquisition capability. However, there are still shortcomings in the model's analysis and recognition abilities when dealing with samples tha...
Saved in:
Published in: | Image and vision computing Vol. 149; p. 105158 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
Elsevier B.V
01-09-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Existing multimodal sentiment analysis models can effectively capture sentimental commonalities between different modalities and possess high sentimental acquisition capability. However, there are still shortcomings in the model's analysis and recognition abilities when dealing with samples that exhibit sentimental polarity disagreement between different modalities. Additionally, the dominance of the text modality in multimodal models, particularly those pre-trained with BERT, can hinder the learning of other modalities due to its richer semantic information. This issue becomes particularly pronounced in cases where there is a conflict between multimodal and textual sentimental polarities, often leading to suboptimal analytical results. Besides, the classification ability of each modality is also suppressed by single-task learning. In this paper, We propose a Multi-Task disagreement-Reducing Multimodal Sentiment Fusion Network (MtDr-MSF), designed to enhance the semantic information of non-text modalities and reduce the dominant impact of the textual modality on the model, and to improve the learning capabilities of unimodal networks. We conducted experiments on multimodal sentiment analysis datasets, CMU-MOSI, CMU-MOSEI, and CH-SIMS. The results show that our method outperforms the current SOTA method.
•We can summarize the main contributions of our paper as follow:•Revealing the presence of special samples with contradictory sentiment polarities•Considering the potential issue that may arise due to the dominance of the textual modality.•Proposing a network to reduce the dominance of text and enhance the expression ability of unimodalities.•Testing our model on three datasets, CMU-MOSI, CMU-MOSEI, and CH-SIMS, and it achieves SOTA results. |
---|---|
ISSN: | 0262-8856 |
DOI: | 10.1016/j.imavis.2024.105158 |