My ML Syllabus

    6 minute read    

Objectives

1. Understand neural network

  - Explore the parameter space of gradient descent. 
  - This is also related to interpretations/visualization for ML -- Visualization_for_ML.pdf at NeurIPS 2018. 
  - Get started on some SGD problems.

2. Learn reinforcement learning

  - Bandit problem, Thompson sampling -- A Tutorial on Thompson Sampling. 
  - Put more emphasis on MDP -- Follow OpenAI curriculum. 

3. Possible final projects

  - ML visualization, explore the low-dimensional parameter space. 
  - Implement Soft Actor-Critic Off-Policy Maximum Entropy Deep RL. 
  - Apply RL some real-life problems (business management, finance, or health care).
  - Inspiration: [Use RL on a non-RL problem: Neural architecture search](https://arxiv.org/pdf/1611.01578.pdf)

4. Become a Machine Learning Fellow at OpenAI starting from this summer.

Day-to-Day Plan

Time frame is from Feb 1st to May 2nd. Weeks are divided following format: Sat – Fri.

Week 1. Feb 2 – Feb 8. Preparations

Feb 2

 Touring San Francisco (cold and windy)

Feb 3

 Flying back to NJ

Feb 4

 Put everything together

Feb 5

 Read about visualization for ML and putting ideas together. Write this syllabus. 

Feb 6

 1. Read ICLR 2018 uber research paper and NeurIPS 2018 google research paper. Write to put ideas together. Working on Uber paper GitHub codes. 
 2. Measuring the Intrinsic Dimension of Objective Landscapes: https://eng.uber.com/intrinsic-dimension/
 3. PCA of high dimensional random walks with the comparison to neural network training.

Feb 7

 4. Read [RL intro](https://github.com/jachiam/rl-intro/blob/master/Presentation/rl_intro.pdf). 
 5. Read [Reinforcement Learning Resources](https://docs.google.com/document/d/1ZqQ-kG1YSErty4hsukYW2B5EeTzIZ3XpIR8_zm2gPxI/edit).
 6. Survey RL applications. [comment]: <> (This is a comment, it will not be included)
 7. Setting up the [Github blog](https://jekyllthemes.io/). 

Feb 8

 8. Writing a Github blog. 

Week 2 Feb 9 – Feb 15: Convolutional Neural Networks. [Arch0-Convnets]

Feb 9

9. CS231n notes. Great notes by Andrej Karpathy. [notes → paper]
10. Start implementing some of the algorithms. 

Feb 10

11. ImageNet Classification with Deep Convolutional Neural Networks. Historic.
12. Identity Mappings in Deep Residual Networks. ResNet improvement. [Another improvement is the NeurIPS 2018 best paper “Neural Ordinary Differential Equations”, it is more theory-oriented (optional).]
13. Deep Residual Learning for Image Recognition. Big improvement, much deeper (> 100 layer) networks. 

Feb 11

14. Implement ResNet (as described in Deep Residual Learning or the later Identity Mappings paper) and train a classifier on CIFAR-10.

Feb 12

15. Multi-Scale Context Aggregation by Dilated Convolutions [interesting]. Allows larger receptive fields; used in WaveNet. [important]

Feb 13

16. Implement some empirical work in https://eng.uber.com/intrinsic-dimension/ 

Feb 14

17. Implementation 6 & 7 

Feb 15

18. Implementation 6 & 7  +  Blogging

Week 3 Feb 16 – Feb 22: RNN and LSTM Models for Sequential Data (Arch1-Recurrent)

Feb 16

19. http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 
20. Read: Char-RNN blog post. Nice inspiring introduction to sequence models and demonstration of what they can do. 
21. Code: Implement Char-RNN

Feb 17

22. Read: Char-RNN blog post. Nice inspiring introduction to sequence models and demonstration of what they can do.
23. Code: Implement Char-RNN

Feb 18

24. RNN Regularization. Gives a clean and self-contained description of the LSTM and some intuitions about overfitting.

Feb 19

25. ~~DeepSpeech. Application of RNN models to speech recognition. A lot of systems / engineering details.
~~  Implement LSTM language model and train on text dataset of my choice. #### Feb 20
26. Implement LSTM language model and train on text dataset of my choice.

Feb 21

27. Implement LSTM language model and train on text dataset of my choice.

Feb 22

28. Implement LSTM language model and train on text dataset of my choice.

Week 4 Feb 23 – Mar 1: Reinforcement Learning Intro (RL0-Intro)

Feb 23 (More spinning up then book.) (think about the project after introduction, setup, models can be replaced.)

29. Read: Chapter 1-3 of Sutton and Barto textbook (or skim it and make sure you're comfortable with the material.)  [highly recommended.]
30. Josh’s Spinning up in RL:  https://spinningup.openai.com/en/latest/
31. Read: https://lilianweng.github.io/lil-log/2018/01/23/the-multi-armed-bandit-problem-and-its-solutions.html
32. Read: A Tutorial of Thompson Sampling (1/3) [not necessary, may skip if no time. Can start right from MDP] #### Feb 24. 
33. Read: A Tutorial of Thompson Sampling (2/3) [not necessary, may skip if no time. Can start right from MDP]
34. Work: https://github.com/iosband/ts_tutorial #### Feb 25
35. Read Chapter 4-5 of Sutton and Barto textbook (or skim it and make sure you're comfortable with the material.)
36. Work: Setting up Gym environment. Solve the problems in MDP_Review.ipynb, which was originally prepared as a homework assignment by John for the Berkeley deep RL course. #### Feb 26
37. Read Chapter 6-7 of Sutton and Barto textbook (or skim it and make sure you're comfortable with the material.)
38. Work: Solve the problems in MDP_Review.ipynb, which was originally prepared as a homework assignment by John for the Berkeley deep RL course. #### Feb 27
39. Read Chapter 8 of Sutton and Barto textbook (or skim it and make sure you're comfortable with the material.)
40. Work: Solve the problems in MDP_Review.ipynb, which was originally prepared as a homework assignment by John for the Berkeley deep RL course. #### Feb 28
41. Work: Solve the problems in MDP_Review.ipynb, which was originally prepared as a homework assignment by John for the Berkeley deep RL course. #### Mar 1
42. Work: Solve the problems in MDP_Review.ipynb, which was originally prepared as a homework assignment by John for the Berkeley deep RL course.

Week 5. Mar 2 – Mar 8: RL2 Q-Learning

I plan to start more project-orientated tasks during week 5 and 6, which means it is hard to plan right now. I will discuss with Lilian more during our weekly meeting and update the syllabus accordingly.

Mar 2

43. [DQN (Deep Q-Network) Paper](https://arxiv.org/abs/1312.5602)
44. [Rainbow Paper](https://arxiv.org/abs/1710.02298)
45. Implement DQN and run it on Breakout and other Atari games. Compare my performance against the Dopamine implementation.
46. (Optional) Implement C51, one of the key components of Rainbow.
47. (Optional) Choose and implement the second component of Rainbow.

Mar 3

48. DDPG (Deep Deterministic Policy Gradients) Paper
49. TD3 (Twin Delayed Deep Deterministic policy gradients) Paper
50. Implement DDPG, using the provided starter code.

Mar 4

51. SAC (Soft Actor-Critic) Paper
52. HER (Hindsight Experience Replay) Paper
53. Distributional RL (C51)

Mar 5

54. Double DQN
55. Equivalence between policy gradients and soft Q-learning

Mar 6

56. Minimally transform your implementation of DDPG into TD3.
57. (Optional) Minimally transform your implementation of TD3 into SAC.
58. Compare DDPG/TD3/SAC performance on the MuJoCo environments HalfCheetah and Walker2d.

Mar 7 Buffer

Mar 8 Buffer

Week 6. Mar 9 – Mar 15: Policy Gradients (RL1-Policy-Gradients)

Mar 9

 59. Pieter Abbeel's lecture at deep RL boot camp. 
 60. https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html 

Mar 10

 61. Learn A3C, PPO, DDPG & SAC in depth and others if time permits.
 62. Chapters 1-2 of John's thesis
 63. A3C (Asynchronous Advantage Actor-Critic) Paper

Mar 11

 64. TRPO (Trust Region Policy Optimization) Paper
 65. TRPO+GAE (Generalized Advantage Estimation) Paper
 66. Exercise 1. Implement A2C and PPO.

Mar 12

 67. PPO (Proximal Policy Optimization) Paper
 68. ACER Paper
 69. Exercise 2. Implement some basic profiling to determine the bottleneck in your training pipeline.

Mar 13

 70. Reproducibility of Benchmarked Deep RL Tasks
 71. Natural Policy Gradient
 72. Approximately Optimal Approximate Reinforcement Learning
 73. Exercise 3. Log and plot the following diagnostic stats. (See John's lecture for explanation)
 74. Exercise 5. To get a better understanding of the variance of your results, plot training curves for 8 different seeds in a single environment. Do this for 4 different environments of your choice, including at least one Atari environment and one MuJoCo environment.

Mar 14

 75. John's lecture at same boot camp on tuning RL algorithms [John's boot camp videos are very down to implementation. ]
 76. Implement some algorithms. 

Mar 15

 77. Read and implement: Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Week 7. Mar 16 – Mar 22: RL3-Large_Scale

 78. RL3-Large_Scale
 79. or buffer week. Saved for possible implementation problems. 

Week 8. Mar 23 – Mar 29: RL4-Special Topics

 80. RL4-Special Topics
 81. or buffer week. Saved for possible implementation problems.

The following 4 weeks are reserved for the project

Week 9 Mar 30 – Apr 5

Week 10 Apr 6 – Apr 12

Week 11 Apr 13 – Apr 19

Week 12 Apr 20 – Apr 26

Week 13 Apr 27 – May 3

Updated: