Search Results - "Hong, Joey"
-
1
Rules of the Road: Predicting Driving Behavior With a Convolutional Model of Semantic Interactions
Published in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (01-06-2019)“…We focus on the problem of predicting future states of entities in complex, real-world driving scenarios. Previous research has approached this problem via…”
Get full text
Conference Proceeding -
2
Wearing the Witch Identity as a Way of Becoming in Shirley Jackson's We Have Always Lived in the Castle
Published 01-01-2022“…This thesis aims to analyze Shirley Jackson’s last novel We Have Always Lived in the Castle (1962) through the lens of 17th-century New England history and…”
Get full text
Dissertation -
3
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Published 07-11-2024“…Value-based reinforcement learning (RL) can in principle learn effective policies for a wide range of multi-turn problems, from games to dialogue to robotic…”
Get full text
Journal Article -
4
Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations
Published 09-11-2023“…Large language models (LLMs) have emerged as powerful and general solutions to many natural language tasks. However, many of the most important applications of…”
Get full text
Journal Article -
5
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity
Published 31-10-2023“…Offline reinforcement learning (RL) can in principle synthesize more optimal behavior from a dataset consisting only of suboptimal trials. One way that this…”
Get full text
Journal Article -
6
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations
Published 07-11-2024“…Recent progress on large language models (LLMs) has enabled dialogue agents to generate highly naturalistic and plausible text. However, current LLM language…”
Get full text
Journal Article -
7
Learning to Influence Human Behavior with Offline Reinforcement Learning
Published 03-03-2023“…When interacting with people, AI agents do not just influence the state of the world -- they also influence the actions people take in response to the agent,…”
Get full text
Journal Article -
8
On the Sensitivity of Reward Inference to Misspecified Human Models
Published 09-12-2022“…Inferring reward functions from human behavior is at the center of value alignment - aligning AI objectives with what we, humans, actually want. But doing so…”
Get full text
Journal Article -
9
Confidence-Conditioned Value Functions for Offline Reinforcement Learning
Published 08-12-2022“…Offline reinforcement learning (RL) promises the ability to learn effective policies solely using existing, static datasets, without any costly online…”
Get full text
Journal Article -
10
Strategically Conservative Q-Learning
Published 06-06-2024“…Offline reinforcement learning (RL) is a compelling paradigm to extend RL's practical utility by leveraging pre-collected, static datasets, thereby avoiding…”
Get full text
Journal Article -
11
Multi-Task Off-Policy Learning from Bandit Feedback
Published 09-12-2022“…Many practical applications, such as recommender systems and learning to rank, involve solving multiple similar tasks. One example is learning of…”
Get full text
Journal Article -
12
When Should We Prefer Offline Reinforcement Learning Over Behavioral Cloning?
Published 12-04-2022“…Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing previously collected experience, without any online interaction. It…”
Get full text
Journal Article -
13
Compositional Generalization and Decomposition in Neural Program Synthesis
Published 07-04-2022“…When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to…”
Get full text
Journal Article -
14
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Published 29-11-2023“…Large language models (LLMs) provide excellent text-generation capabilities, but standard prompting and generation methods generally do not lead to intentional…”
Get full text
Journal Article -
15
Deep Hierarchy in Bandits
Published 03-02-2022“…Mean rewards of actions are often correlated. The form of these correlations may be complex and unknown a priori, such as the preferences of a user for…”
Get full text
Journal Article -
16
Hierarchical Bayesian Bandits
Published 12-11-2021“…Meta-, multi-task, and federated learning can be all viewed as solving similar tasks, drawn from a distribution that reflects task similarities. We provide a…”
Get full text
Journal Article -
17
ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis
Published 25-07-2023“…When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to…”
Get full text
Journal Article -
18
Thompson Sampling with a Mixture Prior
Published 10-06-2021“…We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in…”
Get full text
Journal Article -
19
Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions
Published 21-06-2019“…The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8454-8462 We focus on the problem of predicting future states of entities in…”
Get full text
Journal Article -
20
Latent Programmer: Discrete Latent Codes for Program Synthesis
Published 01-12-2020“…In many sequence learning tasks, such as program synthesis and document summarization, a key problem is searching over a large space of possible output…”
Get full text
Journal Article