DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling
In this paper, we present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM) to tackle the challenges inherent in data-scarce scenarios. Recently, DMs have opened up the possibility of generating synthetic images to complement a few training i...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
25-09-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper, we present an effective data augmentation framework leveraging
the Large Language Model (LLM) and Diffusion Model (DM) to tackle the
challenges inherent in data-scarce scenarios. Recently, DMs have opened up the
possibility of generating synthetic images to complement a few training images.
However, increasing the diversity of synthetic images also raises the risk of
generating samples outside the target distribution. Our approach addresses this
issue by embedding novel semantic information into text prompts via LLM and
utilizing real images as visual prompts, thus generating semantically rich
images. To ensure that the generated images remain within the target
distribution, we dynamically adjust the guidance weight based on each image's
CLIPScore to control the diversity. Experimental results show that our method
produces synthetic images with enhanced diversity while maintaining adherence
to the target distribution. Consequently, our approach proves to be more
efficient in the few-shot setting on several benchmarks. Our code is available
at https://github.com/kkyuhun94/dalda . |
---|---|
DOI: | 10.48550/arxiv.2409.16949 |