[Illinois] MCB 493 Lecture 11: Temporal-Difference Learning and Reward Prediction

By Thomas J. Anastasio

University of Illinois at Urbana-Champaign

Published on

Abstract

Temporal-difference learning can train neural networks to estimate the future value of a current state and simulate the responses of neurons involved in reward processing

11.1 Learning State Values Using Iterative Dynamic Programming

11.2 Learning State Values Using Least Mean Squares

11.3 Learning State Values using the Method of Temporal Differences

11.4 Simulating Dopamine Neuron Responses Using Temporal-Difference Learning

11.5 Temporal-Difference Learning as a Form of Supervised Learning

Cite this work

Researchers should cite this work as follows:

  • Thomas J. Anastasio (2013), "[Illinois] MCB 493 Lecture 11: Temporal-Difference Learning and Reward Prediction," https://nanohub.org/resources/18947.

    BibTex | EndNote

Location

NCSA Auditorium, University of Illinois at Urbana-Champaign, Urbana, IL

Submitter

NanoBio Node

University of Illinois at Urbana-Champaign

Tags

[Illinois] MCB 493 Lecture 11: Temporal-Difference Learning and Reward Prediction
  • Neural Systems Modeling 1. Neural Systems Modeling 0
    00:00/00:00
  • Figure 11.1 A stochastic gridworld 2. Figure 11.1 A stochastic gridw… 400.25188067444873
    00:00/00:00
  • Figure 11.2 The exact values of the states in the stochastic gridworld 3. Figure 11.2 The exact values o… 1104.4605630579078
    00:00/00:00
  • Figure 11.3 Estimated state values for the gridworld after one epoch of iterative dynamic programming 4. Figure 11.3 Estimated state va… 1964.27296864271
    00:00/00:00
  • Figure 11.4 Estimated state values and root-mean-square error over 100 epochs of iterative dynamic programming in the gridworld 5. Figure 11.4 Estimated state va… 2108.2267795834287
    00:00/00:00
  • Figure 11.5 Estimated state values for the gridworld after one epoch of least-mean-squares learning 6. Figure 11.5 Estimated state va… 2642.8192568856339
    00:00/00:00
  • Figure 11.6 Estimated state values and root-mean-square error over 1000 epochs of least-mean-squares learning in the gridworld 7. Figure 11.6 Estimated state va… 2753.0380722444056
    00:00/00:00
  • Figure 11.7 Estimated state values for the gridworld after one epoch of temporal-difference learning 8. Figure 11.7 Estimated state va… 3240.4908516764185
    00:00/00:00
  • Figure 11.8 Estimated state values and root-mean-square error over 1000 epochs of temporal-difference learning in the gridworld 9. Figure 11.8 Estimated state va… 3299.1271245472276
    00:00/00:00
  • Figure 11.9 The midbrain dopamine pathways 10. Figure 11.9 The midbrain dopam… 3389.8707857341878
    00:00/00:00
  • Figure 11.10 The activity of a midbrain dopamine neuron transfers from a reward to a cue that predicts the reward 11. Figure 11.10 The activity of a… 3507.01936472555
    00:00/00:00
  • Figure 11.11 The activity of a midbrain dopamine neuron is enhanced if a reward is received, and is suppressed if an expected reward is not received 12. Figure 11.11 The activity of a… 3672.0191093155008
    00:00/00:00
  • Figure 11.12 Temporal-difference learning implemented in a neural network model of the midbrain dopamine system 13. Figure 11.12 Temporal-differen… 3736.8537196990806
    00:00/00:00
  • Figure 11.13 Simulating the responses of midbrain dopamine neurons using temporal-difference learning 14. Figure 11.13 Simulating the re… 4639.0797630799607
    00:00/00:00
  • Figure 11.14 Reverse replay of rat hippocampal place cell activity 15. Figure 11.14 Reverse replay of… 5091.8193257131879
    00:00/00:00
  • Figure 11.14 Reverse replay of rat hippocampal place cell activity (Part 1) 16. Figure 11.14 Reverse replay of… 5130.7480043541364
    00:00/00:00
  • Figure 11.14 Reverse replay of rat hippocampal place cell activity (Part 2) 17. Figure 11.14 Reverse replay of… 5145.3412460152895
    00:00/00:00