Search Results - "Pires, Bernardo Avila"
-
1
Pseudo-MDPs and factored linear action models
Published in 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) (01-12-2014)“…In this paper we introduce the concept of pseudo-MDPs to develop abstractions. Pseudo-MDPs relax the requirement that the transition kernel has to be a…”
Get full text
Conference Proceeding -
2
Understanding plasticity in neural networks
Published 02-03-2023“…Plasticity, the ability of a neural network to quickly change its predictions in response to new information, is essential for the adaptability and robustness…”
Get full text
Journal Article -
3
Hierarchical Reinforcement Learning in Complex 3D Environments
Published 28-02-2023“…Hierarchical Reinforcement Learning (HRL) agents have the potential to demonstrate appealing capabilities such as planning and exploration with abstraction,…”
Get full text
Journal Article -
4
A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning
Published 04-06-2024“…Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents. Self-predictive learning provides means to jointly learn a latent…”
Get full text
Journal Article -
5
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Published 29-05-2024“…The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference…”
Get full text
Journal Article -
6
Human Alignment of Large Language Models through Online Preference Optimisation
Published 13-03-2024“…Ensuring alignment of language models' outputs with human preferences is critical to guarantee a useful, safe, and pleasant user experience. Thus, human…”
Get full text
Journal Article -
7
Neural Recursive Belief States in Multi-Agent Reinforcement Learning
Published 03-02-2021“…In multi-agent reinforcement learning, the problem of learning to act is particularly difficult because the policies of co-players may be heavily conditioned…”
Get full text
Journal Article -
8
BYOL-Explore: Exploration by Bootstrapped Prediction
Published 16-06-2022“…We present BYOL-Explore, a conceptually simple yet general approach for curiosity-driven exploration in visually-complex environments. BYOL-Explore learns a…”
Get full text
Journal Article -
9
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning
Published 30-04-2020“…Learning a good representation is an essential component for deep reinforcement learning (RL). Representation learning is especially important in multitask and…”
Get full text
Journal Article -
10
Geometric Entropic Exploration
Published 06-01-2021“…Exploration is essential for solving complex Reinforcement Learning (RL) tasks. Maximum State-Visitation Entropy (MSVE) formulates the exploration problem as a…”
Get full text
Journal Article -
11
World Discovery Models
Published 20-02-2019“…As humans we are driven by a strong desire for seeking novelty in our world. Also upon observing a novel pattern we are capable of refining our understanding…”
Get full text
Journal Article -
12
Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction
Published 19-02-2021“…State-of-the-art methods for self-supervised learning (SSL) build representations by maximizing the similarity between different transformed "views" of a…”
Get full text
Journal Article -
13
Statistical Linear Estimation with Penalized Estimators: an Application to Reinforcement Learning
Published 27-06-2012“…Motivated by value function estimation in reinforcement learning, we study statistical linear inverse problems, i.e., problems where the coefficients of a…”
Get full text
Journal Article -
14
Bootstrap your own latent: A new approach to self-supervised Learning
Published 13-06-2020“…We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to…”
Get full text
Journal Article -
15
Off-policy Distributional Q($\lambda$): Distributional RL without Importance Sampling
Published 08-02-2024“…We introduce off-policy distributional Q($\lambda$), a new addition to the family of off-policy distributional evaluation algorithms. Off-policy distributional…”
Get full text
Journal Article -
16
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Published 29-05-2023“…Multi-step learning applies lookahead over multiple time steps and has proved valuable in policy evaluation settings. However, in the optimal control case, the…”
Get full text
Journal Article -
17
Understanding the performance gap between online and offline alignment algorithms
Published 14-05-2024“…Reinforcement learning from human feedback (RLHF) is the canonical framework for large language model alignment. However, rising popularity in offline…”
Get full text
Journal Article -
18
The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning
Published 15-07-2022“…We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our…”
Get full text
Journal Article -
19
Generalized Preference Optimization: A Unified Approach to Offline Alignment
Published 08-02-2024“…Offline preference optimization allows fine-tuning large models directly from offline data, and has proved effective in recent alignment practices. We propose…”
Get full text
Journal Article -
20
Multiclass Classification Calibration Functions
Published 20-09-2016“…In this paper we refine the process of computing calibration functions for a number of multiclass classification surrogate losses. Calibration functions are a…”
Get full text
Journal Article