Steven James | Reinforcement Learning

Portable Symbolic Representations

The focus of my current work is designing autonomous agents that are capable of learning high-level concepts that are abstracted away from the low-level perceptual details. Take the example of robots, which face the difficult task of generating behaviours while sensing and acting in a high-dimensional and continuous space. Planning at this low level is typically not feasible — the robot’s innate action space involves directly actuating motors at a high frequency, but it would take thousands of such actuations to accomplish most useful goals. Similarly, sensors provide very high-dimensional signals that are often continuous and noisy, further exacerbating matters.

This is clearly a difficult problem, but it's also one that humans are able to overcome quite easily. We are able to do so because, even though our actions consist of small muscle twitches, we do not reason about the world or plan at this level of detail. Rather, we represent the world using abstract concepts, which we the use to construct plans. This is clearly a desirable approach, but learning an appropriate abstract representation that is both sound and useful is difficult.

My thesis revolves around learning not only these abstract representations of an agent's environment, but learning portable representations. That is, we should be able to transfer the learned abstractions between different tasks, so that the agent need not learn a new task from scratch every time. You can read more about it in the extended abstract below:

Extended Abstract

Rollout Policies in Monte Carlo Tree Search

Our previous work involved an empirical analysis of Monte Carlo Tree Search (MCTS). One peculiarity about this algorithm is that the manner in which simulations are performed (the rollout policy) can have an unexpected effect of the algorithm's overall performance. In particular, intelligent simulations do not necessarily result in a stronger MCTS algorithm. Why is that the case?

When making a choice of which action to select, we ultimately only care about the actions' relative ordering — their values are not important as long as the best action is ranked higher than all the others. Our results indicate that under certain "smooth" conditions, a uniformly random rollout policy will actually preserve this ordering, which explains why it performs so well. Similarly, stronger policies can result in disastrous outcomes their bias is too large and variance too small. You can read more about these experiments in the papers below:

Workshop Paper Conference Paper