Search Results - "Taufeeque, Mohammad"

  • Showing 1 - 4 results of 4
Refine Results
  1. 1

    Codebook Features: Sparse and Discrete Interpretability for Neural Networks by Tamkin, Alex, Taufeeque, Mohammad, Goodman, Noah D

    Published 26-10-2023
    “…Understanding neural networks is challenging in part because of the dense, continuous nature of their hidden states. We explore whether we can train neural…”
    Get full text
    Journal Article
  2. 2

    Exploiting Novel GPT-4 APIs by Pelrine, Kellin, Taufeeque, Mohammad, Zając, Michał, McLean, Euan, Gleave, Adam

    Published 21-12-2023
    “…Language model attacks typically assume one of two extreme threat models: full white-box access to model weights, or black-box access limited to a text…”
    Get full text
    Journal Article
  3. 3

    Planning in a recurrent neural network that plays Sokoban by Taufeeque, Mohammad, Quirke, Philip, Li, Maximilian, Cundy, Chris, Tucker, Aaron David, Gleave, Adam, Garriga-Alonso, Adrià

    Published 22-07-2024
    “…How a neural network (NN) generalizes to novel situations depends on whether it has learned to select actions heuristically or via a planning process. "An…”
    Get full text
    Journal Article
  4. 4

    imitation: Clean Imitation Learning Implementations by Gleave, Adam, Taufeeque, Mohammad, Rocamonde, Juan, Jenner, Erik, Wang, Steven H, Toyer, Sam, Ernestus, Maximilian, Belrose, Nora, Emmons, Scott, Russell, Stuart

    Published 21-11-2022
    “…imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch. We include three inverse reinforcement learning (IRL)…”
    Get full text
    Journal Article