Search Results - "Jin, Weike"
-
1
Graph-Based Multi-Interaction Network for Video Question Answering
Published in IEEE transactions on image processing (2021)“…Video question answering is an important task combining both Natural Language Processing and Computer Vision, which requires a machine to obtain a thorough…”
Get full text
Journal Article -
2
Adaptive Spatio-Temporal Graph Enhanced Vision-Language Representation for Video QA
Published in IEEE transactions on image processing (01-01-2021)“…Vision-language research has become very popular, which focuses on understanding of visual contents, language semantics and relationships between them. Video…”
Get full text
Journal Article -
3
Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks
Published in IEEE transactions on circuits and systems for video technology (01-12-2020)“…Video dialog is a new and challenging task, which requires an AI agent to maintain a meaningful dialog with humans in natural language about video contents…”
Get full text
Journal Article -
4
TaoHighlight: Commodity-Aware Multi-Modal Video Highlight Detection in E-Commerce
Published in IEEE transactions on multimedia (2022)“…In e-commerce, product related video is important content to introduce product characteristics and attract consumers. Especially in the recommendation system…”
Get full text
Journal Article -
5
MLSLT: Towards Multilingual Sign Language Translation
Published in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2022)“…Most of the research to date focuses on bilingual sign language translation (BSLT). However, such models are in-efficient in building multilingual sign…”
Get full text
Conference Proceeding -
6
Multi-Turn Video Question Generation via Reinforced Multi-Choice Attention Network
Published in IEEE transactions on circuits and systems for video technology (01-05-2021)“…Video question generation is a challenging task in visual information retrieval, which generates questions given a sequence of video frames. The existing…”
Get full text
Journal Article -
7
Gloss Attention for Gloss-free Sign Language Translation
Published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2023)“…Most sign language translation (SLT) methods to date require the use of gloss annotations to provide additional supervision information, however, the…”
Get full text
Conference Proceeding -
8
Frame-Subtitle Self-Supervision for Multi-Modal Video Question Answering
Published 08-09-2022“…Multi-modal video question answering aims to predict correct answer and localize the temporal boundary relevant to the question. The temporal annotations of…”
Get full text
Journal Article -
9
Gloss Attention for Gloss-free Sign Language Translation
Published 14-07-2023“…Most sign language translation (SLT) methods to date require the use of gloss annotations to provide additional supervision information, however, the…”
Get full text
Journal Article -
10
VLAD-VSA: Cross-Domain Face Presentation Attack Detection with Vocabulary Separation and Adaptation
Published 21-02-2022“…For face presentation attack detection (PAD), most of the spoofing cues are subtle, local image patterns (e.g., local image distortion, 3D mask edge and cut…”
Get full text
Journal Article -
11
SimulSLT: End-to-End Simultaneous Sign Language Translation
Published 08-12-2021“…Sign language translation as a kind of technology with profound social significance has attracted growing researchers' interest in recent years. However, the…”
Get full text
Journal Article -
12
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Published 26-09-2024“…GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However,…”
Get full text
Journal Article