Search Results - "Pires, Bernardo Avila"

Refine Results
  1. 1

    Pseudo-MDPs and factored linear action models by Hengshuai Yao, Szepesvari, Csaba, Pires, Bernardo Avila, Xinhua Zhang

    “…In this paper we introduce the concept of pseudo-MDPs to develop abstractions. Pseudo-MDPs relax the requirement that the transition kernel has to be a…”
    Get full text
    Conference Proceeding
  2. 2

    Understanding plasticity in neural networks by Lyle, Clare, Zheng, Zeyu, Nikishin, Evgenii, Pires, Bernardo Avila, Pascanu, Razvan, Dabney, Will

    Published 02-03-2023
    “…Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness…”
    Get full text
    Journal Article
  3. 3

    Hierarchical Reinforcement Learning in Complex 3D Environments by Pires, Bernardo Avila, Behbahani, Feryal, Soyer, Hubert, Nikiforou, Kyriacos, Keck, Thomas, Singh, Satinder

    Published 28-02-2023
    “…Hierarchical Reinforcement Learning (HRL) agents have the potential to demonstrate appealing capabilities such as planning and exploration with abstraction,…”
    Get full text
    Journal Article
  4. 4

    A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning by Khetarpal, Khimya, Guo, Zhaohan Daniel, Pires, Bernardo Avila, Tang, Yunhao, Lyle, Clare, Rowland, Mark, Heess, Nicolas, Borsa, Diana, Guez, Arthur, Dabney, Will

    Published 04-06-2024
    “…Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent…”
    Get full text
    Journal Article
  5. 5
  6. 6

    Human Alignment of Large Language Models through Online Preference Optimisation by Calandriello, Daniele, Guo, Daniel, Munos, Remi, Rowland, Mark, Tang, Yunhao, Pires, Bernardo Avila, Richemond, Pierre Harvey, Lan, Charline Le, Valko, Michal, Liu, Tianqi, Joshi, Rishabh, Zheng, Zeyu, Piot, Bilal

    Published 13-03-2024
    “…Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human…”
    Get full text
    Journal Article
  7. 7

    Neural Recursive Belief States in Multi-Agent Reinforcement Learning by Moreno, Pol, Hughes, Edward, McKee, Kevin R, Pires, Bernardo Avila, Weber, Théophane

    Published 03-02-2021
    “…In multi-agent reinforcement learning, the problem of learning to act is particularly difficult because the policies of co-players may be heavily conditioned…”
    Get full text
    Journal Article
  8. 8

    BYOL-Explore: Exploration by Bootstrapped Prediction by Guo, Zhaohan Daniel, Thakoor, Shantanu, Pîslar, Miruna, Pires, Bernardo Avila, Altché, Florent, Tallec, Corentin, Saade, Alaa, Calandriello, Daniele, Grill, Jean-Bastien, Tang, Yunhao, Valko, Michal, Munos, Rémi, Azar, Mohammad Gheshlaghi, Piot, Bilal

    Published 16-06-2022
    “…We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. BYOL-Explore learns a…”
    Get full text
    Journal Article
  9. 9

    Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning by Guo, Daniel, Pires, Bernardo Avila, Piot, Bilal, Grill, Jean-bastien, Altché, Florent, Munos, Rémi, Azar, Mohammad Gheshlaghi

    Published 30-04-2020
    “…Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and…”
    Get full text
    Journal Article
  10. 10

    Geometric Entropic Exploration by Guo, Zhaohan Daniel, Azar, Mohammad Gheshlaghi, Saade, Alaa, Thakoor, Shantanu, Piot, Bilal, Pires, Bernardo Avila, Valko, Michal, Mesnard, Thomas, Lattimore, Tor, Munos, Rémi

    Published 06-01-2021
    “…Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum State-Visitation Entropy (MSVE) formulates the exploration problem as a…”
    Get full text
    Journal Article
  11. 11

    World Discovery Models by Azar, Mohammad Gheshlaghi, Piot, Bilal, Pires, Bernardo Avila, Grill, Jean-Bastien, Altché, Florent, Munos, Rémi

    Published 20-02-2019
    “…As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding…”
    Get full text
    Journal Article
  12. 12

    Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction by Azabou, Mehdi, Azar, Mohammad Gheshlaghi, Liu, Ran, Lin, Chi-Heng, Johnson, Erik C, Bhaskaran-Nair, Kiran, Dabagia, Max, Avila-Pires, Bernardo, Kitchell, Lindsey, Hengen, Keith B, Gray-Roncal, William, Valko, Michal, Dyer, Eva L

    Published 19-02-2021
    “…State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different transformed "views" of a…”
    Get full text
    Journal Article
  13. 13

    Statistical Linear Estimation with Penalized Estimators: an Application to Reinforcement Learning by Pires, Bernardo Avila, Szepesvari, Csaba

    Published 27-06-2012
    “…Motivated by value function estimation in reinforcement learning, we study statistical linear inverse problems, i.e., problems where the coefficients of a…”
    Get full text
    Journal Article
  14. 14

    Bootstrap your own latent: A new approach to self-supervised Learning by Grill, Jean-Bastien, Strub, Florian, Altché, Florent, Tallec, Corentin, Richemond, Pierre H, Buchatskaya, Elena, Doersch, Carl, Pires, Bernardo Avila, Guo, Zhaohan Daniel, Azar, Mohammad Gheshlaghi, Piot, Bilal, Kavukcuoglu, Koray, Munos, Rémi, Valko, Michal

    Published 13-06-2020
    “…We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to…”
    Get full text
    Journal Article
  15. 15

    Off-policy Distributional Q($\lambda$): Distributional RL without Importance Sampling by Tang, Yunhao, Rowland, Mark, Munos, Rémi, Pires, Bernardo Ávila, Dabney, Will

    Published 08-02-2024
    “…We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional…”
    Get full text
    Journal Article
  16. 16

    DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm by Tang, Yunhao, Kozuno, Tadashi, Rowland, Mark, Harutyunyan, Anna, Munos, Rémi, Pires, Bernardo Ávila, Valko, Michal

    Published 29-05-2023
    “…Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings. However, in the optimal control case, the…”
    Get full text
    Journal Article
  17. 17

    Understanding the performance gap between online and offline alignment algorithms by Tang, Yunhao, Guo, Daniel Zhaohan, Zheng, Zeyu, Calandriello, Daniele, Cao, Yuan, Tarassov, Eugene, Munos, Rémi, Pires, Bernardo Ávila, Valko, Michal, Cheng, Yong, Dabney, Will

    Published 14-05-2024
    “…Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline…”
    Get full text
    Journal Article
  18. 18

    The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning by Tang, Yunhao, Rowland, Mark, Munos, Rémi, Pires, Bernardo Ávila, Dabney, Will, Bellemare, Marc G

    Published 15-07-2022
    “…We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our…”
    Get full text
    Journal Article
  19. 19

    Generalized Preference Optimization: A Unified Approach to Offline Alignment by Tang, Yunhao, Guo, Zhaohan Daniel, Zheng, Zeyu, Calandriello, Daniele, Munos, Rémi, Rowland, Mark, Richemond, Pierre Harvey, Valko, Michal, Pires, Bernardo Ávila, Piot, Bilal

    Published 08-02-2024
    “…Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices. We propose…”
    Get full text
    Journal Article
  20. 20

    Multiclass Classification Calibration Functions by Pires, Bernardo Ávila, Szepesvári, Csaba

    Published 20-09-2016
    “…In this paper we refine the process of computing calibration functions for a number of multiclass classification surrogate losses. Calibration functions are a…”
    Get full text
    Journal Article