Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Real World Deep RL
Implementing Custom Gym Environments
Strategy for Solving Real World Deep RL Problems using Gym and RLlib (4:27)
The Inventory Management Environment: State, Action, Reward and Transition (10:06)
Markov Decision Process (MDP) (3:25)
Turning the Inventory Management Environment into a MDP (7:50)
Setting up the Conda Development Environment (7:48)
How to Implement a Custom Gym Environment Part 1: Required and Optional Methods (8:42)
How to Implement a Custom Gym Environment Part 2: Defining Observation and Action Space (12:58)
Coding Exercise: Inventory Management with Penalty for Unfulfilled Demand
How to Implement a Custom Gym Environment Part 3: Coding the reset() Method (10:39)
Coding Exercise: Implement reset() for the Hard Inventory Management Problem
How to Implement a Custom Gym Environment Part 4: Coding the step() Method (10:20)
How to Implement a Custom Gym Environment Part 5: Testing the step() Method (8:01)
Coding Exercise: Implement step() for the Hard Inventory Management Problem
Using Ray-RLlib to Solve a Custom Environment (6:39)
Observation/Action Normalization and Reward Scaling using Gym Wrappers
How to Get the Agent to Learn: An Overview (10:40)
Using Gym Wrappers to Modify Environments (13:05)
Writing a Gym Wrapper to Normalize Observations (8:33)
Coding Exercise: Use Wrappers to Derive the Hard Inventory Management Problem
How to Use Gym's Built-in NormalizeObservation Wrapper (5:46)
Action Normalization (5:55)
Writing a Gym Wrapper to Reduce Variance of Stepwise Rewards (13:02)
Coding Exercise: Use a Wrapper to Implement Goodwill Penalty
Coding Exercise: Edit the Reward Scaling Wrapper to Account for Goodwill Penalty
How to Use Gym's Built-in NormalizeReward Wrapper (2:40)
Running Ray Tune Experiments
How to Use Custom Environments and Custom Wrappers with Ray RLlib (18:59)
Running Experiments in Parallel Using Grid Search (13:56)
CPU and GPU Resources Consumed by Parallel Experiments (9:26)
Running Parallel Experiments with Fewer Resources (12:07)
Interpreting Experiment Results using Tensorboard (10:03)
Coding Exercise: Find Best Wrapper Combination in the Hard Inventory Management Problem
Coding Exercise: Performance Variance in Identical Trials
Baselines (9:44)
Using Ray Tune's Analysis Module to Compare Trained RL Agent's Performance with Baseline: Part 1 (15:33)
Using Ray Tune's Analysis Module to Compare Trained RL Agent's Performance with Baseline: Part 2 (5:21)
Coding Exercise: Compute Baseline and Compare with Best Agent's Performance
Boosting Performance Further Using Hyperparameter Tuning
Which PPO Hyperparameters Should We Tune? (11:54)
Tuning PPO Hyperparameters Using Ray Tune's Grid Search (12:57)
Coding Exercise: Find the Best Layer Sizes and Activation Function
Population Based Training (PBT) for Faster and Better Hyperparameter Tuning (6:59)
Implementing Population Based Training in Ray Tune (19:43)
Does Population Based Training Give Us Better Performance? (4:32)
Coding Exercise: Does a PBT Optimized Agent Beat the Baseline in the Hard Inventory Management Problem?
Rate and Review the Course
Coding Exercise: Edit the Reward Scaling Wrapper to Account for Goodwill Penalty
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock