From d62c50ab97bc8c9a85c451cfd07a1a15c0a70e17 Mon Sep 17 00:00:00 2001
From: Nikolay Chechulin <nchechulin@aol.com>
Date: Fri, 15 Jan 2021 23:33:25 +0300
Subject: [PATCH 1/2] format document

---
 README.md | 65 ++++++++++++++++++++++++++++---------------------------
 1 file changed, 33 insertions(+), 32 deletions(-)

diff --git a/README.md b/README.md
index 39dbe03..69cb0f4 100644
--- a/README.md
+++ b/README.md
@@ -3,82 +3,83 @@
 This contains basic tools for implementing Reinforcement Learning algorithms and gym environments. Mainly aiming for systems with continious state space and action space.
 
 ## gym environments:
+
 - [DC-DC buck converter](rl/gym_env/buck.py)
 - [DC-DC boost converter](rl/gym_env/boost.py)
 - [four node buck (DC) microgrid](rl/gym_env/buck_microgrid.py)
+
 ## RL algorithms
-- ```buck_ddpg``` run DDPG on a simple buck converter environment.
-![DC-DC buck converter](results/results_plot_nice.png)
+
+- `buck_ddpg` run DDPG on a simple buck converter environment.
+  ![DC-DC buck converter](results/results_plot_nice.png)
 
 # How to use?
-```python buck_ddpg --gamma=0.9 --max_episodes=100 --actor_lr=0.0001 --critic_lr=0.01 summary_dir='./results_buck_ddps'```
+
+`python buck_ddpg --gamma=0.9 --max_episodes=100 --actor_lr=0.0001 --critic_lr=0.01 summary_dir='./results_buck_ddps'`
 will run the ddpg algorithm on buck converter, with discount factor = 0.9, for 100 episodes, and actor and critic learning rates 0.0001, 0.01, respectively. Finally saves the results in the folder = './results_buck_ddps' (the folder should be available)
 
 # Complete argument list:
 
 Use argparse to set the parameters of the desired experiment. Running buck_ddpg.py as a script will then output the results to a named and dated directory in the results folder.
 
-```summary_dir``` folder path to load and save the model. Saved all the results in .mat format.
+`summary_dir` folder path to load and save the model. Saved all the results in .mat format.
 
-```save_model``` (```bool```) if ```True``` saves the model in the ```summary_dir```
+`save_model` (`bool`) if `True` saves the model in the `summary_dir`
 
-```load_model``` (```bool```) if ```True``` loads the model in the ```summary_dir```
+`load_model` (`bool`) if `True` loads the model in the `summary_dir`
 
-```random_seed```  (```int```)  seeding the random number generator (NOT completely implemented)
+`random_seed` (`int`) seeding the random number generator (NOT completely implemented)
 
-```buffer_size``` (```int```) replay buffer size
+`buffer_size` (`int`) replay buffer size
 
-```max_episodes``` (```int```) max number of episodes for training
+`max_episodes` (`int`) max number of episodes for training
 
-```max_episode_len``` (```int```) Number of steps per epsiode
+`max_episode_len` (`int`) Number of steps per epsiode
 
-```mini_batch_size``` (```int```) sampling batch size drawn from replay buffer
+`mini_batch_size` (`int`) sampling batch size drawn from replay buffer
 
-```actor_lr``` (```float```) actor network learning rate
+`actor_lr` (`float`) actor network learning rate
 
-```critic_lr``` (```float```) critic network learning rate
+`critic_lr` (`float`) critic network learning rate
 
-```gamma``` (```float```) models the long term returns (discount factor)
+`gamma` (`float`) models the long term returns (discount factor)
 
-```noise_var``` (```float```) starting variance of the exploration noise at each episode, and decreased as the episode progress
+`noise_var` (`float`) starting variance of the exploration noise at each episode, and decreased as the episode progress
 
-```scaling```  (```bool```) If ```True``` scales the states before using for training
+`scaling` (`bool`) If `True` scales the states before using for training
 
-```state_dim``` (```int```) state dimension of environment
+`state_dim` (`int`) state dimension of environment
 
-```action_dim``` (```int```) action space dimension
+`action_dim` (`int`) action space dimension
 
-```action_bound``` (```float```) upper and lower bound of the actions
+`action_bound` (`float`) upper and lower bound of the actions
 
-```discretization_time``` (```float```) discretization time used for the environment
+`discretization_time` (`float`) discretization time used for the environment
 
 ### Actor and Critic network is implemented using LSTM's + two hidden layers
 
-```time_steps``` (```int```) Number of time-steps for rnn (LSTM)
-
-```actor_rnn``` (```int```) actor network rnn layer paramerters
+`time_steps` (`int`) Number of time-steps for rnn (LSTM)
 
-```actor_l1``` (```int```) actor network layer 1 parameters
+`actor_rnn` (`int`) actor network rnn layer paramerters
 
-```actor_l2``` (```int```) actor network layer 2 parameters
+`actor_l1` (`int`) actor network layer 1 parameters
 
+`actor_l2` (`int`) actor network layer 2 parameters
 
+`critic_rnn` (`int`) critic network rnn layer paramerters
 
-```critic_rnn``` (```int```) critic network rnn layer paramerters
-
-```critic_l1``` (```int```) critic network layer 1 parameters
-
-```critic_l2``` (```int```) critic network layer 2 parameters
-
-```tau```  (```float```)  target network learning rate
+`critic_l1` (`int`) critic network layer 1 parameters
 
+`critic_l2` (`int`) critic network layer 2 parameters
 
+`tau` (`float`) target network learning rate
 
 # Dependencies
 
 Written in TensorFlow 2.0 (Keras)
 
 Requires the following PiPy packages
+
 ```
 import matplotlib.pyplot as plt
 import numpy as np

From 41817ded17126442384e230eb1683b79a9952dfd Mon Sep 17 00:00:00 2001
From: Nikolay Chechulin <nchechulin@aol.com>
Date: Fri, 15 Jan 2021 23:35:38 +0300
Subject: [PATCH 2/2] format argument lists

---
 README.md | 73 +++++++++++++++++++------------------------------------
 1 file changed, 25 insertions(+), 48 deletions(-)

diff --git a/README.md b/README.md
index 69cb0f4..c91a9d6 100644
--- a/README.md
+++ b/README.md
@@ -22,57 +22,34 @@ will run the ddpg algorithm on buck converter, with discount factor = 0.9, for 1
 
 Use argparse to set the parameters of the desired experiment. Running buck_ddpg.py as a script will then output the results to a named and dated directory in the results folder.
 
-`summary_dir` folder path to load and save the model. Saved all the results in .mat format.
-
-`save_model` (`bool`) if `True` saves the model in the `summary_dir`
-
-`load_model` (`bool`) if `True` loads the model in the `summary_dir`
-
-`random_seed` (`int`) seeding the random number generator (NOT completely implemented)
-
-`buffer_size` (`int`) replay buffer size
-
-`max_episodes` (`int`) max number of episodes for training
-
-`max_episode_len` (`int`) Number of steps per epsiode
-
-`mini_batch_size` (`int`) sampling batch size drawn from replay buffer
-
-`actor_lr` (`float`) actor network learning rate
-
-`critic_lr` (`float`) critic network learning rate
-
-`gamma` (`float`) models the long term returns (discount factor)
-
-`noise_var` (`float`) starting variance of the exploration noise at each episode, and decreased as the episode progress
-
-`scaling` (`bool`) If `True` scales the states before using for training
-
-`state_dim` (`int`) state dimension of environment
-
-`action_dim` (`int`) action space dimension
-
-`action_bound` (`float`) upper and lower bound of the actions
-
-`discretization_time` (`float`) discretization time used for the environment
+- `summary_dir` folder path to load and save the model. Saved all the results in .mat format.
+- `save_model` (`bool`) if `True` saves the model in the `summary_dir`
+- `load_model` (`bool`) if `True` loads the model in the `summary_dir`
+- `random_seed` (`int`) seeding the random number generator (NOT completely implemented)
+- `buffer_size` (`int`) replay buffer size
+- `max_episodes` (`int`) max number of episodes for training
+- `max_episode_len` (`int`) Number of steps per epsiode
+- `mini_batch_size` (`int`) sampling batch size drawn from replay buffer
+- `actor_lr` (`float`) actor network learning rate
+- `critic_lr` (`float`) critic network learning rate
+- `gamma` (`float`) models the long term returns (discount factor)
+- `noise_var` (`float`) starting variance of the exploration noise at each episode, and decreased as the episode progress
+- `scaling` (`bool`) If `True` scales the states before using for training
+- `state_dim` (`int`) state dimension of environment
+- `action_dim` (`int`) action space dimension
+- `action_bound` (`float`) upper and lower bound of the actions
+- `discretization_time` (`float`) discretization time used for the environment
 
 ### Actor and Critic network is implemented using LSTM's + two hidden layers
 
-`time_steps` (`int`) Number of time-steps for rnn (LSTM)
-
-`actor_rnn` (`int`) actor network rnn layer paramerters
-
-`actor_l1` (`int`) actor network layer 1 parameters
-
-`actor_l2` (`int`) actor network layer 2 parameters
-
-`critic_rnn` (`int`) critic network rnn layer paramerters
-
-`critic_l1` (`int`) critic network layer 1 parameters
-
-`critic_l2` (`int`) critic network layer 2 parameters
-
-`tau` (`float`) target network learning rate
+- `time_steps` (`int`) Number of time-steps for rnn (LSTM)
+- `actor_rnn` (`int`) actor network rnn layer paramerters
+- `actor_l1` (`int`) actor network layer 1 parameters
+- `actor_l2` (`int`) actor network layer 2 parameters
+- `critic_rnn` (`int`) critic network rnn layer paramerters
+- `critic_l1` (`int`) critic network layer 1 parameters
+- `critic_l2` (`int`) critic network layer 2 parameters
+- `tau` (`float`) target network learning rate
 
 # Dependencies