Search Results - "Olah, Chris"
-
1
Scaling Laws and Interpretability of Learning from Repeated Data
Published 20-05-2022“…Recent large language models have been trained on vast datasets, but also often on repeated data, either intentionally for the purpose of upweighting higher…”
Get full text
Journal Article -
2
In-context Learning and Induction Heads
Published 23-09-2022“…"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present…”
Get full text
Journal Article -
3
Predictability and Surprise in Large Generative Models
Published 03-10-2022“…Large-scale pre-training has recently emerged as a technique for creating capable, general purpose, generative models such as GPT-3, Megatron-Turing NLG,…”
Get full text
Journal Article -
4
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Published 12-04-2022“…We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We…”
Get full text
Journal Article -
5
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Published 23-08-2022“…We describe our early efforts to red team language models in order to simultaneously discover, measure, and attempt to reduce their potentially harmful…”
Get full text
Journal Article -
6
Language Models (Mostly) Know What They Know
Published 11-07-2022“…We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show…”
Get full text
Journal Article -
7
A General Language Assistant as a Laboratory for Alignment
Published 01-12-2021“…Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human…”
Get full text
Journal Article -
8
Concrete Problems in AI Safety
Published 21-06-2016“…Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In…”
Get full text
Journal Article -
9
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Published 14-03-2016“…TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using…”
Get full text
Journal Article