Hugh Gene Loebner

  The AISB were sad to learn last week of the passing of philanthropist and inventor Hugh Gene Loebner PhD, who died peacefully in his home in New York at the age of 74.  Hugh was founder and sponsor of The Loebner Prize, an an...


AI Europe 2016

  Partnership between AISB and AI Europe 2016: Next December 5th and 6th in London, AI Europe will bring together the European AI eco-system by gathering new tools and future technologies appearing in professional fields for th...


AISB convention 2017

  In the run up to AISB2017 convention (, I've asked Joanna Bryson, from the organising team, to answer few questions about the convention and what comes with it. Mohammad Majid...


Harold Cohen

Harold Cohen, tireless computer art pioneer dies at 87   Harold Cohen at the Tate (1983) Aaron image in background   Harold Cohen died at 87 in his studio on 27th April 2016 in Encintias California, USA.The first time I hear...


Dancing with Pixies?...

At TEDx Tottenham, London Mark Bishop (the former chair of the Society) demonstrates that if the ongoing EU flagship science project - the 1.6 billion dollar "Human Brain Project” - ultimately succeeds in understanding all as...


Computerised Minds. ...

A video sponsored by the society discusses Searle's Chinese Room Argument (CRA) and the heated debates surrounding it. In this video, which is accessible to the general public and those with interest in AI, Olly's Philosophy Tube ...


Connection Science

All individual members of The Society for the Study of Artificial Intelligence and Simulation of Behaviour have a personal subscription to the Taylor Francis journal Connection Science as part of their membership. How to Acce...



AISB event Bulletin Item

ICML-07 Tutorial on Bayesian Methods for Reinforcement Learning

ICML-07 Tutorial on
Bayesian Methods for Reinforcement Learning
Corvallis, Oregon, USA
20 June 2007


Although Bayesian methods for Reinforcement Learning can be traced
back to the 1960s (Howard's work in Operations Research), Bayesian
methods have only been used sporadically in modern Reinforcement
Learning. This is in part because non-Bayesian approaches tend to be
much simpler to work with. However, recent advances have shown that
Bayesian approaches do not need to be as complex as initially thought
and offer several theoretical advantages. For instance, by keeping
track of full distributions (instead of point estimates) over the
unknowns, Bayesian approaches permit a more comprehensive
quantification of the uncertainty regarding the transition
probabilities, the rewards, the value function parameters and the
policy parameters. Such distributional information can be used to
optimize (in a principled way) the classic exploration/exploitation
tradeoff, which can speed up the learning process. Similarly, active
learning for reinforcement learning can be naturally optimized. The
estimation of gradient performance with respect to value function
and/or policy parameters can also be done more accurately while using
less data. Bayesian approaches also facilitate the encoding of prior
knowledge and the explicit formulation of domain assumptions. The
primary goal of this tutorial is to raise the awareness of the
research community with regard to Bayesian methods, their properties
and potential benefits for the advancement of Reinforcement Learning.


1. Introduction to Reinforcement Learning and Bayesian learning

2. History of Bayesian RL

3. Model-based Bayesian RL
3.1 Policy optimization techniques
3.2 Encoding of domain knowledge
3.3 Exploration/exploitation tradeoff and active learning
3.4 Bayesian imitation learning in RL
3.5 Bayesian multi-agent coordination and coalition formation in RL

4. Model-free Bayesian RL
4.1 Gaussian process temporal difference (GPTD)
4.2 Gaussian process SARSA
4.3 Bayesian policy gradient
4.4 Bayesian actor-critic algorithms

5. Demo
5.1 Control of an octopus arm using GPTD


Pascal Poupart, University of Waterloo

Pascal Poupart received a Ph.D. degree in Computer Science from the
University of Toronto in 2005. Since August 2004, he is an Assistant
Professor in the David R. Cheriton School of Computer Science at the
University of Waterloo. Poupart's research focuses on the design and
analysis of scalable algorithms for sequential decision making under
uncertainty (including Bayesian reinforcement learning), with
application to assistive technologies in eldercare, spoken dialogue
management and information retrieval. He has served on the program
committee of several international conferences, including AAMAS
(2006, 2007), UAI (2005, 2006, 2007), ICML (2007), AAAI (2005, 2006,
2007), NIPS (2007) and AISTATS (2007).

Mohammad Ghavamzadeh, University of Alberta

Mohammad Ghavamzadeh received a Ph.D. degree in computer science
from the University of Massachusetts Amherst in 2005. Since September
2005 he has been a postdoctoral fellow at the Department of Computing
Science at the University of Alberta, working with Prof. Richard
Sutton. The main objective of his research is to investigate the
principles of scalable decision-making grounded by real-world
applications. In the last two years, Ghavamzadeh?s research has been
mostly focused on using recent advances in statistical machine
learning, especially Bayesian reasoning and kernel methods, to develop
more scalable reinforcement learning algorithms.

Yaakov Engel, University of Alberta

Yaakov Engel received a Ph.D. degree from the Hebrew University of
Jerusalem in 2005. Since April 2005 he has been a postdoctoral fellow
with the Alberta Ingenuity Centre for Machine Learning (AICML) at the
Department of Computing Science at the University of Alberta.