Completed The Fast Deep RL Course? Then you know how to solve simple problems using OpenAI Gym and Ray-RLlib. You also have a good basis in Deep RL.
The next obvious step is to apply this knowledge to real-world problems. At this juncture, it is common to face the following hurdles.
- No premade Gym environment exists for the problem you want to attack. You must define and create your own custom environment.
- After creating a custom environment, you apply an appropriate Deep RL algorithm with default framework settings. But the agent doesn't seem to learn anything or displays poor learning.
In this course, you will gain the skills needed to overcome these hurdles and become effective in real-world applications.
You will learn the main ideas behind designing and implementing custom Gym environments for real-world problems.
You will also learn how to apply several performance enhancing tricks and Ray Tune experiments to easily identify the most promising tricks. Here is a list of tricks we will cover.
- Redefining the observations and actions to cast the environment as close as possible to a Markov Decision Process
- Scaling observations, actions, and rewards to standard ranges preferred by Deep Neural Nets
- Shaping reward functions to help the agent better distinguish between good and bad actions
- Preprocessing the inputs to reduce complexity while preserving critical information
- Tuning the network size and hyperparameters of the Ray-RLlib algorithms to improve performance
Remember the shop inventory management problem from the The Fast Deep RL Course? That's an example of a real world problem.
By the end of the course, you will make a custom Gym environment for inventory management, shape rewards, normalize observations and actions, tune hyperparameters, and much more. By running Ray Tune experiments, you will find the best learning settings and create a Deep RL agent that performs better than classical inventory management techniques.
After solving this example problem, you will be able to use a step-by-step method to solve real-world Deep RL problems that you encounter in your industry.
If you liked The Fast Deep RL Course, I think you will like this course too. After all, real world application is the next natural and exciting step.
I am looking forward to seeing you inside!
Prerequisites
- You should be able to use Ray RLlib to solve OpenAI Gym environments. This means that you know the basic Python API of OpenAI Gym and the basic Python API of Ray RLlib. If you completed The Fast Deep RL Course, then you already know this stuff.
What will you learn?
Chapter 1
In Chapter 1, you will learn how to create custom Gym environments.
- You will be able to design observations and actions such that the environment is close to a Markov Decision Process. This gives Deep RL algorithms the best chance for learning.
- You will be able to implement custom Gym environments by inheriting the base environment class and defining the required methods.
Chapter 2
In Chapter 2, you will learn how to scale observations and actions, and shape rewards using Gym Wrappers.
- You will be able to modify your custom environments further by using Gym Wrappers.
- You will write wrappers for scaling observations and actions to ranges preferred by Deep Neural Nets.
- You will write several wrappers to shape the reward function.
Chapter 3
In Chapter 3, you will learn how to try out various performance-boosting ideas quickly by running Ray Tune experiments.
- You will be able to run parallelly running experiments using custom environments and custom wrappers.
- You will be able to allocate CPU and GPU resources efficiently in your local Ray server for the fastest experiment execution.
- You will visualize experiment results using Tensorboard and pick the best learning settings.
- You will be able to benchmark your best results against baselines.
Chapter 4
In Chapter 4, you will learn how to tune hyperparameters to give your agent another boost in performance.
- You will know which hyperparameters you need to tune and their potential ranges.
- You will be able to tune hyperparameters using simple methods like grid search and advanced methods like Population Based Training.
Course Curriculum
- Strategy for Solving Real World Deep RL Problems using Gym and RLlib (4:27)
- The Inventory Management Environment: State, Action, Reward and Transition (10:06)
- Markov Decision Process (MDP) (3:25)
- Turning the Inventory Management Environment into a MDP (7:50)
- Setting up the Conda Development Environment (7:48)
- How to Implement a Custom Gym Environment Part 1: Required and Optional Methods (8:42)
- How to Implement a Custom Gym Environment Part 2: Defining Observation and Action Space (12:58)
- Coding Exercise: Inventory Management with Penalty for Unfulfilled Demand
- How to Implement a Custom Gym Environment Part 3: Coding the reset() Method (10:39)
- Coding Exercise: Implement reset() for the Hard Inventory Management Problem
- How to Implement a Custom Gym Environment Part 4: Coding the step() Method (10:20)
- How to Implement a Custom Gym Environment Part 5: Testing the step() Method (8:01)
- Coding Exercise: Implement step() for the Hard Inventory Management Problem
- Using Ray-RLlib to Solve a Custom Environment (6:39)
- How to Get the Agent to Learn: An Overview (10:40)
- Using Gym Wrappers to Modify Environments (13:05)
- Writing a Gym Wrapper to Normalize Observations (8:33)
- Coding Exercise: Use Wrappers to Derive the Hard Inventory Management Problem
- How to Use Gym's Built-in NormalizeObservation Wrapper (5:46)
- Action Normalization (5:55)
- Writing a Gym Wrapper to Reduce Variance of Stepwise Rewards (13:02)
- Coding Exercise: Use a Wrapper to Implement Goodwill Penalty
- Coding Exercise: Edit the Reward Scaling Wrapper to Account for Goodwill Penalty
- How to Use Gym's Built-in NormalizeReward Wrapper (2:40)
- How to Use Custom Environments and Custom Wrappers with Ray RLlib (18:59)
- Running Experiments in Parallel Using Grid Search (13:56)
- CPU and GPU Resources Consumed by Parallel Experiments (9:26)
- Running Parallel Experiments with Fewer Resources (12:07)
- Interpreting Experiment Results using Tensorboard (10:03)
- Coding Exercise: Find Best Wrapper Combination in the Hard Inventory Management Problem
- Coding Exercise: Performance Variance in Identical Trials
- Baselines (9:44)
- Using Ray Tune's Analysis Module to Compare Trained RL Agent's Performance with Baseline: Part 1 (15:33)
- Using Ray Tune's Analysis Module to Compare Trained RL Agent's Performance with Baseline: Part 2 (5:21)
- Coding Exercise: Compute Baseline and Compare with Best Agent's Performance
- Which PPO Hyperparameters Should We Tune? (11:54)
- Tuning PPO Hyperparameters Using Ray Tune's Grid Search (12:57)
- Coding Exercise: Find the Best Layer Sizes and Activation Function
- Population Based Training (PBT) for Faster and Better Hyperparameter Tuning (6:59)
- Implementing Population Based Training in Ray Tune (19:43)
- Does Population Based Training Give Us Better Performance? (4:32)
- Coding Exercise: Does a PBT Optimized Agent Beat the Baseline in the Hard Inventory Management Problem?
- Rate and Review the Course
Features
Easy to digest
Bite sized video lessons with no fluff (on an average 10 mins long and rarely over 15 mins).
The whole course can be completed in 8 hours (including exercises).
Easy to follow
All videos will have closed captions.
Learn by doing
Video lessons and demonstrations will be followed by coding exercises whenever possible.
Project based
The exercises will be part of an overarching project, where you will teach an agent to manage shop inventory.
Hi, I am Dibya, the instructor of this course 👋
- I am a Senior Python Engineer based in Germany. I have worked on engineering projects of the biggest automotive companies and the German government.
- I founded the Artificial General Intelligence community and co-organize the Python developer community in Munich, Germany.
- I like teaching what I know. I have trained thousands of Data Engineers/Scientists on the world's largest Data Science education platform Datacamp.
Enrollment is risk free
Your purchase is protected by a 14-day money-back guarantee. If the course doesn't meet your needs for any reason, let me know within 14 days of the purchase, and you will get a full refund. No questions asked.
Have any other questions prior to enrollment? Please drop me a message.