L-C4: Language-Based Video Colorization for Creative and Consistent Color
Automatic video colorization is inherently an ill-posed problem because each monochrome frame has multiple optional color candidates. Previous exemplar-based video colorization methods restrict the user's imagination due to the elaborate retrieval process. Alternatively, conditional image color...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
07-10-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Automatic video colorization is inherently an ill-posed problem because each
monochrome frame has multiple optional color candidates. Previous
exemplar-based video colorization methods restrict the user's imagination due
to the elaborate retrieval process. Alternatively, conditional image
colorization methods combined with post-processing algorithms still struggle to
maintain temporal consistency. To address these issues, we present
Language-based video Colorization for Creative and Consistent Colors (L-C4) to
guide the colorization process using user-provided language descriptions. Our
model is built upon a pre-trained cross-modality generative model, leveraging
its comprehensive language understanding and robust color representation
abilities. We introduce the cross-modality pre-fusion module to generate
instance-aware text embeddings, enabling the application of creative colors.
Additionally, we propose temporally deformable attention to prevent flickering
or color shifts, and cross-clip fusion to maintain long-term color consistency.
Extensive experimental results demonstrate that L-C4 outperforms relevant
methods, achieving semantically accurate colors, unrestricted creative
correspondence, and temporally robust consistency. |
---|---|
DOI: | 10.48550/arxiv.2410.04972 |