-
Relative Entropy Pathwise Policy Optimization - Technical Overview
A lightweight overview of the new REPPO algorithm
-
REPPO - Why build a new algorithm
A tongue-in-cheek history of REPPO
-
Loss Functions and Calibration
Reminder to post about CVAML
-
Reward Design and Termination
Understanding the interplay between reward design, termination, and truncation in RL