Search Results - "Hejna, Joey"

  • Showing 1 - 15 results of 15
Refine Results
  1. 1

    Inverse Preference Learning: Preference-based RL without a Reward Function by Hejna, Joey, Sadigh, Dorsa

    Published 24-05-2023
    “…Reward functions are difficult to design and often hard to align with human intent. Preference-based Reinforcement Learning (RL) algorithms address these…”
    Get full text
    Journal Article
  2. 2

    Few-Shot Preference Learning for Human-in-the-Loop RL by Hejna, Joey, Sadigh, Dorsa

    Published 06-12-2022
    “…While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has…”
    Get full text
    Journal Article
  3. 3

    Improving Long-Horizon Imitation Through Instruction Prediction by Hejna, Joey, Abbeel, Pieter, Pinto, Lerrel

    Published 21-06-2023
    “…Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in…”
    Get full text
    Journal Article
  4. 4

    Distance Weighted Supervised Learning for Offline Interaction Data by Hejna, Joey, Gao, Jensen, Sadigh, Dorsa

    Published 26-04-2023
    “…Sequential decision making algorithms often struggle to leverage different sources of unstructured offline interaction data. Imitation learning (IL) methods…”
    Get full text
    Journal Article
  5. 5

    MotIF: Motion Instruction Fine-tuning by Hwang, Minyoung, Hejna, Joey, Sadigh, Dorsa, Bisk, Yonatan

    Published 16-09-2024
    “…While success in many robotics tasks can be determined by only observing the final state and how it differs from the initial state - e.g., if an apple is…”
    Get full text
    Journal Article
  6. 6

    Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning by Hejna, Joey, Bhateja, Chethan, Jian, Yichen, Pertsch, Karl, Sadigh, Dorsa

    Published 26-08-2024
    “…Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that…”
    Get full text
    Journal Article
  7. 7

    From $r$ to $Q^$: Your Language Model is Secretly a Q-Function by Rafailov, Rafael, Hejna, Joey, Park, Ryan, Finn, Chelsea

    Published 18-04-2024
    “…Reinforcement Learning From Human Feedback (RLHF) has been critical to the success of the latest generation of generative AI models. In response to the complex…”
    Get full text
    Journal Article
  8. 8

    So You Think You Can Scale Up Autonomous Robot Data Collection? by Mirchandani, Suvir, Belkhale, Suneel, Hejna, Joey, Choi, Evelyn, Islam, Md Sazzad, Sadigh, Dorsa

    Published 04-11-2024
    “…A long-standing goal in robot learning is to develop methods for robots to acquire new skills autonomously. While reinforcement learning (RL) comes with the…”
    Get full text
    Journal Article
  9. 9

    Extreme Q-Learning: MaxEnt RL without Entropy by Garg, Divyansh, Hejna, Joey, Geist, Matthieu, Ermon, Stefano

    Published 05-01-2023
    “…Modern Deep Reinforcement Learning (RL) algorithms require estimates of the maximal Q-value, which are difficult to compute in continuous domains with an…”
    Get full text
    Journal Article
  10. 10

    Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms by Rafailov, Rafael, Chittepu, Yaswanth, Park, Ryan, Sikchi, Harshit, Hejna, Joey, Knox, Bradley, Finn, Chelsea, Niekum, Scott

    Published 04-06-2024
    “…Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent success of Large Language Models (LLMs), however, it is often a complex and…”
    Get full text
    Journal Article
  11. 11

    Show, Don't Tell: Aligning Language Models with Demonstrated Feedback by Shaikh, Omar, Lam, Michelle, Hejna, Joey, Shao, Yijia, Bernstein, Michael, Yang, Diyi

    Published 02-06-2024
    “…Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic…”
    Get full text
    Journal Article
  12. 12

    Contrastive Preference Learning: Learning from Human Feedback without RL by Hejna, Joey, Rafailov, Rafael, Sikchi, Harshit, Finn, Chelsea, Niekum, Scott, Knox, W. Bradley, Sadigh, Dorsa

    Published 20-10-2023
    “…Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for aligning models with human intent. Typically RLHF algorithms operate in…”
    Get full text
    Journal Article
  13. 13

    Vision Language Models are In-Context Value Learners by Ma, Yecheng Jason, Hejna, Joey, Wahid, Ayzaan, Fu, Chuyuan, Shah, Dhruv, Liang, Jacky, Xu, Zhuo, Kirmani, Sean, Xu, Peng, Driess, Danny, Xiao, Ted, Tompson, Jonathan, Bastani, Osbert, Jayaraman, Dinesh, Yu, Wenhao, Zhang, Tingnan, Sadigh, Dorsa, Xia, Fei

    Published 07-11-2024
    “…Predicting temporal progress from visual trajectories is important for intelligent robots that can learn, adapt, and improve. However, learning such progress…”
    Get full text
    Journal Article
  14. 14

    Octo: An Open-Source Generalist Robot Policy by Octo Model Team, Ghosh, Dibya, Walke, Homer, Pertsch, Karl, Black, Kevin, Mees, Oier, Dasari, Sudeep, Hejna, Joey, Kreiman, Tobias, Xu, Charles, Luo, Jianlan, Tan, You Liang, Chen, Lawrence Yunliang, Sanketi, Pannag, Vuong, Quan, Xiao, Ted, Sadigh, Dorsa, Finn, Chelsea, Levine, Sergey

    Published 20-05-2024
    “…Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such…”
    Get full text
    Journal Article
  15. 15

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset by Khazatsky, Alexander, Pertsch, Karl, Nair, Suraj, Balakrishna, Ashwin, Dasari, Sudeep, Karamcheti, Siddharth, Nasiriany, Soroush, Srirama, Mohan Kumar, Chen, Lawrence Yunliang, Ellis, Kirsty, Fagan, Peter David, Hejna, Joey, Itkina, Masha, Lepert, Marion, Ma, Yecheng Jason, Miller, Patrick Tree, Wu, Jimmy, Belkhale, Suneel, Dass, Shivin, Ha, Huy, Jain, Arhan, Lee, Abraham, Lee, Youngwoon, Memmel, Marius, Park, Sungjae, Radosavovic, Ilija, Wang, Kaiyuan, Zhan, Albert, Black, Kevin, Chi, Cheng, Hatch, Kyle Beltran, Lin, Shan, Lu, Jingpei, Mercat, Jean, Rehman, Abdul, Sanketi, Pannag R, Sharma, Archit, Simpson, Cody, Vuong, Quan, Walke, Homer Rich, Wulfe, Blake, Xiao, Ted, Yang, Jonathan Heewon, Yavary, Arefeh, Zhao, Tony Z, Agia, Christopher, Baijal, Rohan, Castro, Mateo Guaman, Chen, Daphne, Chen, Qiuyu, Chung, Trinity, Drake, Jaimyn, Foster, Ethan Paul, Gao, Jensen, Herrera, David Antonio, Heo, Minho, Hsu, Kyle, Hu, Jiaheng, Jackson, Donovon, Le, Charlotte, Li, Yunshuang, Lin, Kevin, Lin, Roy, Ma, Zehan, Maddukuri, Abhiram, Mirchandani, Suvir, Morton, Daniel, Nguyen, Tony, O'Neill, Abigail, Scalise, Rosario, Seale, Derick, Son, Victor, Tian, Stephen, Tran, Emi, Wang, Andrew E, Wu, Yilin, Xie, Annie, Yang, Jingyun, Yin, Patrick, Zhang, Yunchu, Bastani, Osbert, Berseth, Glen, Bohg, Jeannette, Goldberg, Ken, Gupta, Abhinav, Gupta, Abhishek, Jayaraman, Dinesh, Lim, Joseph J, Malik, Jitendra, Martín-Martín, Roberto, Ramamoorthy, Subramanian, Sadigh, Dorsa, Song, Shuran, Wu, Jiajun, Yip, Michael C, Zhu, Yuke, Kollar, Thomas, Levine, Sergey, Finn, Chelsea

    Published 19-03-2024
    “…The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic…”
    Get full text
    Journal Article