Autonomous Improvement of Instruction Following Skills via Foundation Models
Intelligent instruction-following robots capable of improving from autonomously collected experience have the potential to transform robot learning: instead of collecting costly teleoperated demonstration data, large-scale deployment of fleets of robots can quickly collect larger quantities of auton...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
30-07-2024
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Intelligent instruction-following robots capable of improving from
autonomously collected experience have the potential to transform robot
learning: instead of collecting costly teleoperated demonstration data,
large-scale deployment of fleets of robots can quickly collect larger
quantities of autonomous data that can collectively improve their performance.
However, autonomous improvement requires solving two key problems: (i) fully
automating a scalable data collection procedure that can collect diverse and
semantically meaningful robot data and (ii) learning from non-optimal,
autonomous data with no human annotations. To this end, we propose a novel
approach that addresses these challenges, allowing instruction-following
policies to improve from autonomously collected data without human supervision.
Our framework leverages vision-language models to collect and evaluate
semantically meaningful experiences in new environments, and then utilizes a
decomposition of instruction following tasks into (semantic)
language-conditioned image generation and (non-semantic) goal reaching, which
makes it significantly more practical to improve from this autonomously
collected data without any human annotations. We carry out extensive
experiments in the real world to demonstrate the effectiveness of our approach,
and find that in a suite of unseen environments, the robot policy can be
improved 2x with autonomously collected data. We open-source the code for our
semantic autonomous improvement pipeline, as well as our autonomous dataset of
30.5K trajectories collected across five tabletop environments. |
---|---|
DOI: | 10.48550/arxiv.2407.20635 |