Multi-channel Temporal Convolution Fusion for Multimodal Sentiment Analysis

Multimodal sentiment analysis has become a hot research direction in affective computing by extending unimodal analysis to multimodal environments with information fusion. Word-level representation fusion is a key technique for modeling cross-modal interactions by capturing interplay between differe...

Full description

Saved in:
Bibliographic Details
Published in:Jisuanji kexue yu tansuo Vol. 18; no. 11; pp. 3041 - 3050
Main Author: SUN Jie, CHE Wengang, GAO Shengxiang
Format: Journal Article
Language:Chinese
Published: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press 01-11-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multimodal sentiment analysis has become a hot research direction in affective computing by extending unimodal analysis to multimodal environments with information fusion. Word-level representation fusion is a key technique for modeling cross-modal interactions by capturing interplay between different modal elements. And word-level representation fusion faces two main challenges: local interactions between modal elements and global interactions along the temporal dimension. Existing methods often adopt attention mechanisms to model correlations between overall features across modalities when modeling local interactions, while ignoring interactions between adjacent elements and local features, and are computationally expensive. To address these issues, a multi-channel temporal convolution fusion (MCTCF) model is proposed, which uses 2D convolutions to obtain local interactions between modal elements. Specifically, local connections can capture associations between neighboring elements, multi-channel convolutio
ISSN:1673-9418
DOI:10.3778/j.issn.1673-9418.2309071