This week I learned the details of some RL algorithms (PPO, DDPG, TRPO) and implemented them. Please see my CoLab Python Notebook for more information.
The introduction part explain the key ingredients of the algorithm and their mathematical forms. The implementation part turn the idea into code.
I am in the progress of learning RL so the notebooks will be updated constantly.