Search Results - "Hejna, Joey"
-
1
Inverse Preference Learning: Preference-based RL without a Reward Function
Published 24-05-2023“…Reward functions are difficult to design and often hard to align with human intent. Preference-based Reinforcement Learning (RL) algorithms address these…”
Get full text
Journal Article -
2
Few-Shot Preference Learning for Human-in-the-Loop RL
Published 06-12-2022“…While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has…”
Get full text
Journal Article -
3
Improving Long-Horizon Imitation Through Instruction Prediction
Published 21-06-2023“…Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in…”
Get full text
Journal Article -
4
Distance Weighted Supervised Learning for Offline Interaction Data
Published 26-04-2023“…Sequential decision making algorithms often struggle to leverage different sources of unstructured offline interaction data. Imitation learning (IL) methods…”
Get full text
Journal Article -
5
MotIF: Motion Instruction Fine-tuning
Published 16-09-2024“…While success in many robotics tasks can be determined by only observing the final state and how it differs from the initial state - e.g., if an apple is…”
Get full text
Journal Article -
6
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning
Published 26-08-2024“…Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that…”
Get full text
Journal Article -
7
From $r$ to $Q^$: Your Language Model is Secretly a Q-Function
Published 18-04-2024“…Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex…”
Get full text
Journal Article -
8
So You Think You Can Scale Up Autonomous Robot Data Collection?
Published 04-11-2024“…A long-standing goal in robot learning is to develop methods for robots to acquire new skills autonomously. While reinforcement learning (RL) comes with the…”
Get full text
Journal Article -
9
Extreme Q-Learning: MaxEnt RL without Entropy
Published 05-01-2023“…Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an…”
Get full text
Journal Article -
10
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
Published 04-06-2024“…Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and…”
Get full text
Journal Article -
11
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Published 02-06-2024“…Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic…”
Get full text
Journal Article -
12
Contrastive Preference Learning: Learning from Human Feedback without RL
Published 20-10-2023“…Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in…”
Get full text
Journal Article -
13
Vision Language Models are In-Context Value Learners
Published 07-11-2024“…Predicting temporal progress from visual trajectories is important for intelligent robots that can learn, adapt, and improve. However, learning such progress…”
Get full text
Journal Article -
14
Octo: An Open-Source Generalist Robot Policy
Published 20-05-2024“…Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such…”
Get full text
Journal Article -
15
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Published 19-03-2024“…The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic…”
Get full text
Journal Article