Course Highlights
  • Apply gradient-based supervised machine learning methods to reinforcement learning
  • Understand reinforcement learning on a technical level
  • Understand the relationship between reinforcement learning and psychology
  • Implement 17 different reinforcement learning algorithms
Curriculum

5 Topics
Introduction
Course Outline and Big Picture
Where to get the Code
How to Succeed in this Course
Warmup

26 Topics
Section Introduction: The Explore-Exploit Dilemma
Applications of the Explore-Exploit Dilemma
Epsilon-Greedy Theory
Calculating a Sample Mean (pt 1)
Epsilon-Greedy Beginner's Exercise Prompt
Designing Your Bandit Program
Epsilon-Greedy in Code
Comparing Different Epsilons
Optimistic Initial Values Theory
Optimistic Initial Values Beginner's Exercise Prompt
Optimistic Initial Values Code
UCB1 Theory
UCB1 Beginner's Exercise Prompt
UCB1 Code
Bayesian Bandits / Thompson Sampling Theory (pt 1)
Bayesian Bandits / Thompson Sampling Theory (pt 2)
Thompson Sampling Beginner's Exercise Prompt
Thompson Sampling Code
Thompson Sampling With Gaussian Reward Theory
Thompson Sampling With Gaussian Reward Code
Exercise on Gaussian Rewards
Why don't we just use a library?
Nonstationary Bandits
Bandit Summary Real Data and Online Learning
(Optional) Alternative Bandit Designs
Suggestion Box

2 Topics
What is Reinforcement Learning?
From Bandits to Full Reinforcement Learning

14 Topics
MDP Section Introduction
Gridworld
Choosing Rewards
The Markov Property
Markov Decision Processes (MDPs)
Future Rewards
Value Functions
The Bellman Equation (pt 1)
The Bellman Equation (pt 2)
The Bellman Equation (pt 3)
Bellman Examples
Optimal Policy and Optimal Value Function (pt 1)
Optimal Policy and Optimal Value Function (pt 2)
MDP Summary

14 Topics
Dynamic Programming Section Introduction
Iterative Policy Evaluation
Designing Your RL Program
Gridworld in Code
Iterative Policy Evaluation in Code
Windy Gridworld in Code
Iterative Policy Evaluation for Windy Gridworld in Code
Policy Improvement
Policy Iteration
Policy Iteration in Code
Policy Iteration in Windy Gridworld
Value Iteration
Value Iteration in Code
Dynamic Programming Summary

8 Topics
Monte Carlo Intro
Monte Carlo Policy Evaluation
Monte Carlo Policy Evaluation in Code
Monte Carlo Control
Monte Carlo Control in Code
Monte Carlo Control without Exploring Starts
Monte Carlo Control without Exploring Starts in Code
Monte Carlo Summary

8 Topics
Temporal Difference Introduction
TD(0) Prediction
TD(0) Prediction in Code
SARSA
SARSA in Code
Q Learning
Q Learning in Code
TD Learning Section Summary

11 Topics
Approximation Methods Section Introduction
Linear Models for Reinforcement Learning
Feature Engineering
Approximation Methods for Prediction
Approximation Methods for Prediction Code
Approximation Methods for Control
Approximation Methods for Control Code
CartPole
CartPole Code
Approximation Methods Exercise
Approximation Methods Section Summary

1 Topic
This Course vs. RL Book: What's the Difference?

10 Topics
Beginners halt! Stop here if you skipped ahead
Stock Trading Project Section Introduction
Data and Environment
How to Model Q for Q-Learning
Design of the Program
Code pt 1
Code pt 2
Code pt 3
Code pt 4
Stock Trading Project Discussion

3 Topics
Pre-Installation Check
Anaconda Environment Setup
How to install Numpy Scipy Matplotlib Pandas IPython Theano and TensorFlow

4 Topics
How to Code by Yourself (part 1)
How to Code by Yourself (part 2)
Proof that using Jupyter Notebook is the same as not using it
Python 2 vs Python 3

4 Topics
How to Succeed in this Course (Long Version)
Is this for Beginners or Experts? Academic or Practical? Fast or slow-paced?
Machine Learning and AI Prerequisite Roadmap (pt 1)
Machine Learning and AI Prerequisite Roadmap (pt 2)

2 Topics
What is the Appendix?
BONUS

  Write a Review

Artificial Intelligence: Reinforcement Learning in Python

Go to Paid Course