Autoplay
Autocomplete
Previous Lesson
Complete and Continue
Real World Deep RL
Implementing Custom Gym Environments
Strategy for Solving Real World Deep RL Problems using Gym and RLlib (4:27)
The Inventory Management Environment: State, Action, Reward and Transition (10:06)
Markov Decision Process (MDP) (3:25)
Turning the Inventory Management Environment into a MDP (7:50)
Setting up the Conda Development Environment (7:48)
How to Implement a Custom Gym Environment Part 1: Required and Optional Methods (8:42)
How to Implement a Custom Gym Environment Part 2: Defining Observation and Action Space (12:58)
Coding Exercise: Inventory Management with Penalty for Unfulfilled Demand
How to Implement a Custom Gym Environment Part 3: Coding the reset() Method (10:39)
Coding Exercise: Implement reset() for the Hard Inventory Management Problem
How to Implement a Custom Gym Environment Part 4: Coding the step() Method (10:20)
How to Implement a Custom Gym Environment Part 5: Testing the step() Method (8:01)
Coding Exercise: Implement step() for the Hard Inventory Management Problem
Using Ray-RLlib to Solve a Custom Environment (6:39)
Observation/Action Normalization and Reward Scaling using Gym Wrappers
How to Get the Agent to Learn: An Overview (10:40)
Using Gym Wrappers to Modify Environments (13:05)
Writing a Gym Wrapper to Normalize Observations (8:33)
Coding Exercise: Use Wrappers to Derive the Hard Inventory Management Problem
How to Use Gym's Built-in NormalizeObservation Wrapper (5:46)
Action Normalization (5:55)
Writing a Gym Wrapper to Reduce Variance of Stepwise Rewards (13:02)
Coding Exercise: Use a Wrapper to Implement Goodwill Penalty
Coding Exercise: Edit the Reward Scaling Wrapper to Account for Goodwill Penalty
How to Use Gym's Built-in NormalizeReward Wrapper (2:40)
Running Ray Tune Experiments
How to Use Custom Environments and Custom Wrappers with Ray RLlib (18:59)
Running Experiments in Parallel Using Grid Search (13:56)
CPU and GPU Resources Consumed by Parallel Experiments (9:26)
Running Parallel Experiments with Fewer Resources (12:07)
Interpreting Experiment Results using Tensorboard (10:03)
Coding Exercise: Find Best Wrapper Combination in the Hard Inventory Management Problem
Coding Exercise: Performance Variance in Identical Trials
Baselines (9:44)
Using Ray Tune's Analysis Module to Compare Trained RL Agent's Performance with Baseline: Part 1 (15:33)
Using Ray Tune's Analysis Module to Compare Trained RL Agent's Performance with Baseline: Part 2 (5:21)
Coding Exercise: Compute Baseline and Compare with Best Agent's Performance
Boosting Performance Further Using Hyperparameter Tuning
Which PPO Hyperparameters Should We Tune? (11:54)
Tuning PPO Hyperparameters Using Ray Tune's Grid Search (12:57)
Coding Exercise: Find the Best Layer Sizes and Activation Function
Population Based Training (PBT) for Faster and Better Hyperparameter Tuning (6:59)
Implementing Population Based Training in Ray Tune (19:43)
Does Population Based Training Give Us Better Performance? (4:32)
Coding Exercise: Does a PBT Optimized Agent Beat the Baseline in the Hard Inventory Management Problem?
Rate and Review the Course
Which PPO Hyperparameters Should We Tune?
Lesson content locked
If you're already enrolled,
you'll need to login
.
Enroll in Course to Unlock