Search Results - "McCandlish, Sam"
-
1
Scaling Laws for Transfer
Published 01-02-2021“…We study empirical scaling laws for transfer learning between distributions in an unsupervised, fine-tuning setting. When we train increasingly large neural…”
Get full text
Journal Article -
2
Towards Understanding Sycophancy in Language Models
Published 20-10-2023“…Human feedback is commonly utilized to finetune AI assistants. But human feedback may also encourage model responses that match user beliefs over truthful…”
Get full text
Journal Article -
3
Studying Large Language Model Generalization with Influence Functions
Published 07-08-2023“…When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of…”
Get full text
Journal Article -
4
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Published 28-06-2023“…Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to…”
Get full text
Journal Article -
5
An Empirical Model of Large-Batch Training
Published 14-12-2018“…In an increasing number of domains it has been demonstrated that deep learning models can be trained using relatively large batch sizes without sacrificing…”
Get full text
Journal Article -
6
Toy Models of Superposition
Published 21-09-2022“…Neural networks often pack many unrelated concepts into a single neuron - a puzzling phenomenon known as 'polysemanticity' which makes interpretability much…”
Get full text
Journal Article -
7
Measuring Faithfulness in Chain-of-Thought Reasoning
Published 16-07-2023“…Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear…”
Get full text
Journal Article -
8
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning
Published 16-07-2023“…As large language models (LLMs) perform more difficult tasks, it becomes harder to verify the correctness and safety of their behavior. One approach to help…”
Get full text
Journal Article -
9
Specific versus General Principles for Constitutional AI
Published 20-10-2023“…Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a…”
Get full text
Journal Article -
10
Scaling Laws and Interpretability of Learning from Repeated Data
Published 20-05-2022“…Recent large language models have been trained on vast datasets, but also often on repeated data, either intentionally for the purpose of upweighting higher…”
Get full text
Journal Article -
11
In-context Learning and Induction Heads
Published 23-09-2022“…"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present…”
Get full text
Journal Article -
12
Predictability and Surprise in Large Generative Models
Published 03-10-2022“…Large-scale pre-training has recently emerged as a technique for creating capable, general purpose, generative models such as GPT-3, Megatron-Turing NLG,…”
Get full text
Journal Article -
13
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Published 12-04-2022“…We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We…”
Get full text
Journal Article -
14
The Capacity for Moral Self-Correction in Large Language Models
Published 14-02-2023“…We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to…”
Get full text
Journal Article -
15
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Published 23-08-2022“…We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful…”
Get full text
Journal Article -
16
Language Models (Mostly) Know What They Know
Published 11-07-2022“…We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show…”
Get full text
Journal Article -
17
Constitutional AI: Harmlessness from AI Feedback
Published 15-12-2022“…As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant…”
Get full text
Journal Article -
18
A General Language Assistant as a Laboratory for Alignment
Published 01-12-2021“…Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human…”
Get full text
Journal Article -
19
Measuring Progress on Scalable Oversight for Large Language Models
Published 04-11-2022“…Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising systems that…”
Get full text
Journal Article -
20
Scaling Laws for Neural Language Models
Published 22-01-2020“…We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the…”
Get full text
Journal Article