Search Results - "Taufeeque, Mohammad"

1
Codebook Features: Sparse and Discrete Interpretability for Neural Networks by Tamkin, Alex, Taufeeque, Mohammad, Goodman, Noah D

Published 26-10-2023
“…Understanding neural networks is challenging in part because of the dense, continuous nature of their hidden states. We explore whether we can train neural…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
2
Exploiting Novel GPT-4 APIs by Pelrine, Kellin, Taufeeque, Mohammad, Zając, Michał, McLean, Euan, Gleave, Adam

Published 21-12-2023
“…Language model attacks typically assume one of two extreme threat models: full white-box access to model weights, or black-box access limited to a text…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
3
Planning in a recurrent neural network that plays Sokoban by Taufeeque, Mohammad, Quirke, Philip, Li, Maximilian, Cundy, Chris, Tucker, Aaron David, Gleave, Adam, Garriga-Alonso, Adrià

Published 22-07-2024
“…How a neural network (NN) generalizes to novel situations depends on whether it has learned to select actions heuristically or via a planning process. "An…”

Get full text

Journal Article
QR Code
Save to List

Saved in:
4
imitation: Clean Imitation Learning Implementations by Gleave, Adam, Taufeeque, Mohammad, Rocamonde, Juan, Jenner, Erik, Wang, Steven H, Toyer, Sam, Ernestus, Maximilian, Belrose, Nora, Emmons, Scott, Russell, Stuart

Published 21-11-2022
“…imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse reinforcement learning (IRL)…”

Get full text

Journal Article
QR Code
Save to List

Saved in:

Search Results - "Taufeeque, Mohammad"

Codebook Features: Sparse and Discrete Interpretability for Neural Networks by Tamkin, Alex, Taufeeque, Mohammad, Goodman, Noah D

Exploiting Novel GPT-4 APIs by Pelrine, Kellin, Taufeeque, Mohammad, Zając, Michał, McLean, Euan, Gleave, Adam

Planning in a recurrent neural network that plays Sokoban by Taufeeque, Mohammad, Quirke, Philip, Li, Maximilian, Cundy, Chris, Tucker, Aaron David, Gleave, Adam, Garriga-Alonso, Adrià

imitation: Clean Imitation Learning Implementations by Gleave, Adam, Taufeeque, Mohammad, Rocamonde, Juan, Jenner, Erik, Wang, Steven H, Toyer, Sam, Ernestus, Maximilian, Belrose, Nora, Emmons, Scott, Russell, Stuart

Search Tools:

Refine Results

Format

Topic

Language

Year of Publication