UTFaculteitenEEMCSDisciplines & departementenDMBAssignmentsOpen AssignmentsOpen Master Assignments[M] Learning temporally-extended state and action representations in Deep Reinforcement Learning

[M] Learning temporally-extended state and action representations in Deep Reinforcement Learning

Master Assignment

Learning temporally-extended state and action representations in Deep Reinforcement Learning

Type: Master CS

Period: TBD

Student: (Unassigned)

If you are interested please contact :

Introduction

Reinforcement Learning (RL) [1] is the branch of Machine Learning (ML) studying a computation approach for learning behaviors from interaction with world to achieve goals. RL founds its success on the Markov Decision Processes' (MDPs) theory and on the nature-inspired concept of intelligent agent learning to act by interacting with an unknown environment and adjusting its behavior based on the consequences of the actions taken. Consequences that are quantified by a scalar signal called reward.

RL has traditionally relied on feature engineering to face the curse of dimensionality of state and action spaces and make algorithms computationally trackable. Feature engineering comprises all the methods using human intuition, intelligence and knowledge to hand-craft features from data. Examples of features extraction algorithms are edge and corner detection, but also smart discretization of state and action spaces in the context of RL.

Inspired by the successes of Deep Learning (DL), Deep Reinforcement Learning [2] was born as an extension of RL to tackle high-dimensional input data and partial observability through Deep Neural Networks. Despite these challenges, DRL instead has achieved outstanding successes in learning policies directly from high-dimensional observations. However, solutions come at the price of low sample efficiency and instabilities of the learning.

Motivations

In this project, we focus on improving the sample inefficiency of DRL algorithms when applied to high-dimensional control problems. The reward signal is often insufficient for learning good representations and, consequently, good policy without the need for huge amounts of data. Because we cannot rely on labelled data, standard supervised DL methods cannot be directly used. Therefore, DRL has recently shift its focus toward unsupervised learning of data representations [3]. Learning meaningful representations of data without supervision is a big, and open challenge of DL research.

In DRL, the agent does not know the laws governing the world and it can only perceive partial information, i.e. observations, through its own perception, e.g. via sensors. Given the observation stream, we aim at learning the low-dimensional state and action representations that best incorporate all the relevant information for learning to solve the task.

Goals

This project builds on the concept of learning state and action abstractions proposed in [4], [5], [6] and aims at further investigating the relation between start and action representations. Possible research directions are:

  1. Learning temporally-extended action representations and study the advantages of learning “abstract”policies.
  2. The learning of the inverse mapping from latent actions to the original action space is an ill-posed inverse problem that requires proper treatment
  3. Investigation of the interplay between temporally-extended state and action representations.
  4. Representation learning with Graph Neural Networks [7].

References:

[1] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.

[2] Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).

[3] Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new perspectives." IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 1798-1828.

[4] Whitney, William, et al. "Dynamics-aware embeddings." arXiv preprint arXiv:1908.09357 (2019).

[5] Botteghi, Nicolò, et al. "Low-Dimensional State and Action Representation Learning with MDP

Homomorphism Metrics." arXiv preprint arXiv:2107.01677 (2021).

[6] Botteghi, Nicolò. Robotics Deep Reinforcement Learning with Loose Prior Knowledge. 2021.

[7] Battaglia, Peter W., et al. "Relational inductive biases, deep learning, and graph networks." arXiv preprint arXiv:1806.01261 (2018).