Completed The Fast Deep RL Course? Then you know how to solve simple problems using OpenAI Gym and Ray-RLlib. You also have a good basis in Deep RL.
The next obvious step is to apply this knowledge to real-world problems. At this juncture, it is common to face the following hurdles.
- No premade Gym environment exists for the problem you want to attack. You must define and create your own custom environment.
- After creating a custom environment, you apply an appropriate Deep RL algorithm with default framework settings. But the agent doesn't seem to learn anything or displays poor learning.
In this course, you will gain the skills needed to overcome these hurdles and become effective in real-world applications.
You will learn the main ideas behind designing and implementing custom Gym environments for real-world problems.
You will also learn how to apply several performance enhancing tricks and Ray Tune experiments to easily identify the most promising tricks. Here is a list of tricks we will cover.
- Redefining the observations and actions to cast the environment as close as possible to a Markov Decision Process
- Scaling observations, actions, and rewards to standard ranges preferred by Deep Neural Nets
- Shaping reward functions to help the agent better distinguish between good and bad actions
- Preprocessing the inputs to reduce complexity while preserving critical information
- Tuning the network size and hyperparameters of the Ray-RLlib algorithms to improve performance
Remember the shop inventory management problem from the The Fast Deep RL Course? That's an example of a real world problem.
By the end of the course, you will make a custom Gym environment for inventory management, shape rewards, normalize observations and actions, tune hyperparameters, and much more. By running Ray Tune experiments, you will find the best learning settings and create a Deep RL agent that performs better than classical inventory management techniques.
After solving this example problem, you will be able to use a step-by-step method to solve real-world Deep RL problems that you encounter in your industry.
If you liked The Fast Deep RL Course, I think you will like this course too. After all, real world application is the next natural and exciting step. This course is open for pre-enrollment. All course videos are already available and the launch in planned in September, 2023.
By enrolling now,
- you support the creation of this course.
- Early supporters get immediate access to all lessons.
- The planned launch price is $90. As an early supporter, you will get a 50% discount on that price.
Thank you for supporting this course!
Prerequisites
- You should be able to use Ray RLlib to solve OpenAI Gym environments. This means that you know the basic Python API of OpenAI Gym and the basic Python API of Ray RLlib. If you completed The Fast Deep RL Course, then you already know this stuff.
What will you learn?
Chapter 1
In Chapter 1, you will learn how to create custom Gym environments.
- You will be able to design observations and actions such that the environment is close to a Markov Decision Process. This gives Deep RL algorithms the best chance for learning.
- You will be able to implement custom Gym environments by inheriting the base environment class and defining the required methods.
Chapter 2
In Chapter 2, you will learn how to scale observations and actions, and shape rewards using Gym Wrappers.
- You will be able to modify your custom environments further by using Gym Wrappers.
- You will write wrappers for scaling observations and actions to ranges preferred by Deep Neural Nets.
- You will write several wrappers to shape the reward function.
Chapter 3
In Chapter 3, you will learn how to try out various performance-boosting ideas quickly by running Ray Tune experiments.
- You will be able to run parallelly running experiments using custom environments and custom wrappers.
- You will be able to allocate CPU and GPU resources efficiently in your local Ray server for the fastest experiment execution.
- You will visualize experiment results using Tensorboard and pick the best learning settings.
- You will be able to benchmark your best results against baselines.
Chapter 4
In Chapter 4, you will learn how to tune hyperparameters to give your agent another boost in performance.
- You will know which hyperparameters you need to tune and their potential ranges.
- You will be able to tune hyperparameters using simple methods like grid search and advanced methods like Population Based Training.
Course Curriculum
- Strategy for Solving Real World Deep RL Problems using Gym and RLlib (4:27)
- The Inventory Management Environment: State, Action, Reward and Transition (10:06)
- Markov Decision Process (MDP) (3:25)
- Turning the Inventory Management Environment into a MDP (7:50)
- Setting up the Conda Development Environment (7:48)
- How to Implement a Custom Gym Environment Part 1: Required and Optional Methods (8:42)
- How to Implement a Custom Gym Environment Part 2: Defining Observation and Action Space (12:58)
- How to Implement a Custom Gym Environment Part 3: Coding the reset() Method (10:39)
- How to Implement a Custom Gym Environment Part 4: Coding the step() Method (10:20)
- How to Implement a Custom Gym Environment Part 5: Testing the step() Method (8:01)
- Using Ray-RLlib to Solve a Custom Environment (6:39)
- How to Get the Agent to Learn: An Overview (10:40)
- Using Gym Wrappers to Modify Environments (13:05)
- Writing a Gym Wrapper to Normalize Observations (8:33)
- How to Use Gym's Built-in NormalizeObservation Wrapper (5:46)
- Action Normalization (5:55)
- Writing a Gym Wrapper to Reduce Variance of Stepwise Rewards (13:02)
- How to Use Gym's Built-in NormalizeReward Wrapper (2:40)
- How to Use Custom Environments and Custom Wrappers with Ray RLlib (18:59)
- Running Experiments in Parallel Using Grid Search (13:56)
- CPU and GPU Resources Consumed by Parallel Experiments (9:26)
- Running Parallel Experiments with Fewer Resources (12:07)
- Interpreting Experiment Results using Tensorboard (10:03)
- Baselines (9:44)
- Using Ray Tune's Analysis Module to Compare Trained RL Agent's Performance with Baseline: Part 1 (15:33)
- Using Ray Tune's Analysis Module to Compare Trained RL Agent's Performance with Baseline: Part 2 (5:21)
- Which PPO Hyperparameters Should We Tune? (11:54)
- Tuning PPO Hyperparameters Using Ray Tune's Grid Search (12:57)
- Population Based Training (PBT) for Faster and Better Hyperparameter Tuning (6:59)
- Implementing Population Based Training in Ray Tune (19:43)
- Does Population Based Training Give Us Better Performance? (4:32)
Planned features
Easy to digest
Bite sized video lessons with no fluff (on an average 10 mins long and rarely over 15 mins).
The whole course can be completed in 8 hours (including exercises).
Easy to follow
All videos will have closed captions.
Learn by doing
Video lessons and demonstrations will be followed by coding exercises whenever possible.
Project based
The exercises will be part of an overarching project, where you will teach an agent to manage shop inventory.
Hi, I am Dibya, the instructor of this course 👋
- I am a Senior Python Developer, working closely with one of the biggest automotive companies in Germany.
- I organize the 3000 member Python Meetup community in Munich, Germany.
- I teach a DataCamp course on Unit Testing for Data Science, with 17500+ students.
- I have trained 300+ developers in the domain of Deep RL.
Pre-enroll to support this course🌱
If you want to support this course, please pre-enroll till September, 2023. As an early supporter, you will get some nice perks: immediate access to all lessons and a 50% discount on the launch price of $90.
To ensure you don't have a bad experience, your enrollment is covered by an unconditional money-back guarantee until launch + an additional 14-days after the launch.
Thank you in advance for supporting this course!