A Hands-On Guide to Policy Gradient Algorithms for Beginners - REINFORCE, A2C, PPO

In the previous article, we discussed two methods under model-free RL algorithms: policy-based and value-based. This article will focus on policy-based algorithms such as REINFORCE, Actor-Critic, and PPO methods of learning, and also implement the algorithms using simple Cart-Pole example.