Eliciting Human Preferences with Language Models
Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts. But selecting examples or writing prompts for can be challenging--especially in tasks that involve unusual edge cases, demand precise articulation of nebulous preferences, or require...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Journal Article |
Language: | English |
Published: |
17-10-2023
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Language models (LMs) can be directed to perform target tasks by using
labeled examples or natural language prompts. But selecting examples or writing
prompts for can be challenging--especially in tasks that involve unusual edge
cases, demand precise articulation of nebulous preferences, or require an
accurate mental model of LM behavior. We propose to use *LMs themselves* to
guide the task specification process. In this paper, we introduce **Generative
Active Task Elicitation (GATE)**: a learning framework in which models elicit
and infer intended behavior through free-form, language-based interaction with
users. We study GATE in three domains: email validation, content
recommendation, and moral reasoning. In preregistered experiments, we show that
LMs prompted to perform GATE (e.g., by generating open-ended questions or
synthesizing informative edge cases) elicit responses that are often more
informative than user-written prompts or labels. Users report that interactive
task elicitation requires less effort than prompting or example labeling and
surfaces novel considerations not initially anticipated by users. Our findings
suggest that LM-driven elicitation can be a powerful tool for aligning models
to complex human preferences and values. |
---|---|
DOI: | 10.48550/arxiv.2310.11589 |