Search Results - "Hejna, Joey" :: Katalog Arama

1
Inverse Preference Learning: Preference-based RL without a Reward Function by Hejna, Joey, Sadigh, Dorsa

Published 24-05-2023
“…Reward functions are difficult to design and often hard to align with human intent. Preference-based Reinforcement Learning (RL) algorithms address these…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
2
Few-Shot Preference Learning for Human-in-the-Loop RL by Hejna, Joey, Sadigh, Dorsa

Published 06-12-2022
“…While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
3
Improving Long-Horizon Imitation Through Instruction Prediction by Hejna, Joey, Abbeel, Pieter, Pinto, Lerrel

Published 21-06-2023
“…Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
4
Distance Weighted Supervised Learning for Offline Interaction Data by Hejna, Joey, Gao, Jensen, Sadigh, Dorsa

Published 26-04-2023
“…Sequential decision making algorithms often struggle to leverage different sources of unstructured offline interaction data. Imitation learning (IL) methods…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
5
MotIF: Motion Instruction Fine-tuning by Hwang, Minyoung, Hejna, Joey, Sadigh, Dorsa, Bisk, Yonatan

Published 16-09-2024
“…While success in many robotics tasks can be determined by only observing the final state and how it differs from the initial state - e.g., if an apple is…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
6
Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning by Hejna, Joey, Bhateja, Chethan, Jian, Yichen, Pertsch, Karl, Sadigh, Dorsa

Published 26-08-2024
“…Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
7
From $r$ to $Q^$: Your Language Model is Secretly a Q-Function by Rafailov, Rafael, Hejna, Joey, Park, Ryan, Finn, Chelsea

Published 18-04-2024
“…Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
8
So You Think You Can Scale Up Autonomous Robot Data Collection? by Mirchandani, Suvir, Belkhale, Suneel, Hejna, Joey, Choi, Evelyn, Islam, Md Sazzad, Sadigh, Dorsa

Published 04-11-2024
“…A long-standing goal in robot learning is to develop methods for robots to acquire new skills autonomously. While reinforcement learning (RL) comes with the…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
9
Extreme Q-Learning: MaxEnt RL without Entropy by Garg, Divyansh, Hejna, Joey, Geist, Matthieu, Ermon, Stefano

Published 05-01-2023
“…Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
10
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms by Rafailov, Rafael, Chittepu, Yaswanth, Park, Ryan, Sikchi, Harshit, Hejna, Joey, Knox, Bradley, Finn, Chelsea, Niekum, Scott

Published 04-06-2024
“…Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
11
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback by Shaikh, Omar, Lam, Michelle, Hejna, Joey, Shao, Yijia, Bernstein, Michael, Yang, Diyi

Published 02-06-2024
“…Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
12
Contrastive Preference Learning: Learning from Human Feedback without RL by Hejna, Joey, Rafailov, Rafael, Sikchi, Harshit, Finn, Chelsea, Niekum, Scott, Knox, W. Bradley, Sadigh, Dorsa

Published 20-10-2023
“…Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
13
Vision Language Models are In-Context Value Learners by Ma, Yecheng Jason, Hejna, Joey, Wahid, Ayzaan, Fu, Chuyuan, Shah, Dhruv, Liang, Jacky, Xu, Zhuo, Kirmani, Sean, Xu, Peng, Driess, Danny, Xiao, Ted, Tompson, Jonathan, Bastani, Osbert, Jayaraman, Dinesh, Yu, Wenhao, Zhang, Tingnan, Sadigh, Dorsa, Xia, Fei

Published 07-11-2024
“…Predicting temporal progress from visual trajectories is important for intelligent robots that can learn, adapt, and improve. However, learning such progress…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
14
Octo: An Open-Source Generalist Robot Policy by Octo Model Team, Ghosh, Dibya, Walke, Homer, Pertsch, Karl, Black, Kevin, Mees, Oier, Dasari, Sudeep, Hejna, Joey, Kreiman, Tobias, Xu, Charles, Luo, Jianlan, Tan, You Liang, Chen, Lawrence Yunliang, Sanketi, Pannag, Vuong, Quan, Xiao, Ted, Sadigh, Dorsa, Finn, Chelsea, Levine, Sergey

Published 20-05-2024
“…Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
15
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset by Khazatsky, Alexander, Pertsch, Karl, Nair, Suraj, Balakrishna, Ashwin, Dasari, Sudeep, Karamcheti, Siddharth, Nasiriany, Soroush, Srirama, Mohan Kumar, Chen, Lawrence Yunliang, Ellis, Kirsty, Fagan, Peter David, Hejna, Joey, Itkina, Masha, Lepert, Marion, Ma, Yecheng Jason, Miller, Patrick Tree, Wu, Jimmy, Belkhale, Suneel, Dass, Shivin, Ha, Huy, Jain, Arhan, Lee, Abraham, Lee, Youngwoon, Memmel, Marius, Park, Sungjae, Radosavovic, Ilija, Wang, Kaiyuan, Zhan, Albert, Black, Kevin, Chi, Cheng, Hatch, Kyle Beltran, Lin, Shan, Lu, Jingpei, Mercat, Jean, Rehman, Abdul, Sanketi, Pannag R, Sharma, Archit, Simpson, Cody, Vuong, Quan, Walke, Homer Rich, Wulfe, Blake, Xiao, Ted, Yang, Jonathan Heewon, Yavary, Arefeh, Zhao, Tony Z, Agia, Christopher, Baijal, Rohan, Castro, Mateo Guaman, Chen, Daphne, Chen, Qiuyu, Chung, Trinity, Drake, Jaimyn, Foster, Ethan Paul, Gao, Jensen, Herrera, David Antonio, Heo, Minho, Hsu, Kyle, Hu, Jiaheng, Jackson, Donovon, Le, Charlotte, Li, Yunshuang, Lin, Kevin, Lin, Roy, Ma, Zehan, Maddukuri, Abhiram, Mirchandani, Suvir, Morton, Daniel, Nguyen, Tony, O'Neill, Abigail, Scalise, Rosario, Seale, Derick, Son, Victor, Tian, Stephen, Tran, Emi, Wang, Andrew E, Wu, Yilin, Xie, Annie, Yang, Jingyun, Yin, Patrick, Zhang, Yunchu, Bastani, Osbert, Berseth, Glen, Bohg, Jeannette, Goldberg, Ken, Gupta, Abhinav, Gupta, Abhishek, Jayaraman, Dinesh, Lim, Joseph J, Malik, Jitendra, Martín-Martín, Roberto, Ramamoorthy, Subramanian, Sadigh, Dorsa, Song, Shuran, Wu, Jiajun, Yip, Michael C, Zhu, Yuke, Kollar, Thomas, Levine, Sergey, Finn, Chelsea

Published 19-03-2024
“…The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic…”

Get full text

Journal Article
QR Code
Save to List

Saved in: