Search Results - "Taufeeque, Mohammad"
-
1
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Published 26-10-2023“…Understanding neural networks is challenging in part because of the dense, continuous nature of their hidden states. We explore whether we can train neural…”
Get full text
Journal Article -
2
Exploiting Novel GPT-4 APIs
Published 21-12-2023“…Language model attacks typically assume one of two extreme threat models: full white-box access to model weights, or black-box access limited to a text…”
Get full text
Journal Article -
3
Planning in a recurrent neural network that plays Sokoban
Published 22-07-2024“…How a neural network (NN) generalizes to novel situations depends on whether it has learned to select actions heuristically or via a planning process. "An…”
Get full text
Journal Article -
4
imitation: Clean Imitation Learning Implementations
Published 21-11-2022“…imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse reinforcement learning (IRL)…”
Get full text
Journal Article