Learning Joint Representation of Human Motion and Language
In this work, we present MoLang (a Motion-Language connecting model) for learning joint representation of human motion and language, leveraging both unpaired and paired datasets of motion and language modalities. To this end, we propose a motion-language model with contrastive learning, empowering o...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
27-10-2022
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this work, we present MoLang (a Motion-Language connecting model) for
learning joint representation of human motion and language, leveraging both
unpaired and paired datasets of motion and language modalities. To this end, we
propose a motion-language model with contrastive learning, empowering our model
to learn better generalizable representations of the human motion domain.
Empirical results show that our model learns strong representations of human
motion data through navigating language modality. Our proposed method is able
to perform both action recognition and motion retrieval tasks with a single
model where it outperforms state-of-the-art approaches on a number of action
recognition benchmarks. |
---|---|
DOI: | 10.48550/arxiv.2210.15187 |