asokraju · NChechulin · Jan 15, 2021 · Jan 15, 2021
diff --git a/README.md b/README.md
@@ -3,82 +3,60 @@
 This contains basic tools for implementing Reinforcement Learning algorithms and gym environments. Mainly aiming for systems with continious state space and action space.
 
 ## gym environments:
+
 - [DC-DC buck converter](rl/gym_env/buck.py)
 - [DC-DC boost converter](rl/gym_env/boost.py)
 - [four node buck (DC) microgrid](rl/gym_env/buck_microgrid.py)
+
 ## RL algorithms
-- ```buck_ddpg``` run DDPG on a simple buck converter environment.
-![DC-DC buck converter](results/results_plot_nice.png)
+
+- `buck_ddpg` run DDPG on a simple buck converter environment.
+  ![DC-DC buck converter](results/results_plot_nice.png)
 
 # How to use?
-```python buck_ddpg --gamma=0.9 --max_episodes=100 --actor_lr=0.0001 --critic_lr=0.01 summary_dir='./results_buck_ddps'```
+
+`python buck_ddpg --gamma=0.9 --max_episodes=100 --actor_lr=0.0001 --critic_lr=0.01 summary_dir='./results_buck_ddps'`
 will run the ddpg algorithm on buck converter, with discount factor = 0.9, for 100 episodes, and actor and critic learning rates 0.0001, 0.01, respectively. Finally saves the results in the folder = './results_buck_ddps' (the folder should be available)
 
 # Complete argument list:
 
 Use argparse to set the parameters of the desired experiment. Running buck_ddpg.py as a script will then output the results to a named and dated directory in the results folder.
 
-```summary_dir``` folder path to load and save the model. Saved all the results in .mat format.
-
-```save_model``` (```bool```) if ```True``` saves the model in the ```summary_dir```
-
-```load_model``` (```bool```) if ```True``` loads the model in the ```summary_dir```
-
-```random_seed```  (```int```)  seeding the random number generator (NOT completely implemented)
-
-```buffer_size``` (```int```) replay buffer size
-
-```max_episodes``` (```int```) max number of episodes for training
-
-```max_episode_len``` (```int```) Number of steps per epsiode
-
-```mini_batch_size``` (```int```) sampling batch size drawn from replay buffer
-
-```actor_lr``` (```float```) actor network learning rate
-
-```critic_lr``` (```float```) critic network learning rate
-
-```gamma``` (```float```) models the long term returns (discount factor)
-
-```noise_var``` (```float```) starting variance of the exploration noise at each episode, and decreased as the episode progress
-
-```scaling```  (```bool```) If ```True``` scales the states before using for training
-
-```state_dim``` (```int```) state dimension of environment
-
-```action_dim``` (```int```) action space dimension
-
-```action_bound``` (```float```) upper and lower bound of the actions
-
-```discretization_time``` (```float```) discretization time used for the environment
+- `summary_dir` folder path to load and save the model. Saved all the results in .mat format.
+- `save_model` (`bool`) if `True` saves the model in the `summary_dir`
+- `load_model` (`bool`) if `True` loads the model in the `summary_dir`
+- `random_seed` (`int`) seeding the random number generator (NOT completely implemented)
+- `buffer_size` (`int`) replay buffer size
+- `max_episodes` (`int`) max number of episodes for training
+- `max_episode_len` (`int`) Number of steps per epsiode
+- `mini_batch_size` (`int`) sampling batch size drawn from replay buffer
+- `actor_lr` (`float`) actor network learning rate
+- `critic_lr` (`float`) critic network learning rate
+- `gamma` (`float`) models the long term returns (discount factor)
+- `noise_var` (`float`) starting variance of the exploration noise at each episode, and decreased as the episode progress
+- `scaling` (`bool`) If `True` scales the states before using for training
+- `state_dim` (`int`) state dimension of environment
+- `action_dim` (`int`) action space dimension
+- `action_bound` (`float`) upper and lower bound of the actions
+- `discretization_time` (`float`) discretization time used for the environment
 
 ### Actor and Critic network is implemented using LSTM's + two hidden layers
 
-```time_steps``` (```int```) Number of time-steps for rnn (LSTM)
-
-```actor_rnn``` (```int```) actor network rnn layer paramerters
-
-```actor_l1``` (```int```) actor network layer 1 parameters
-
-```actor_l2``` (```int```) actor network layer 2 parameters
-
-
-
-```critic_rnn``` (```int```) critic network rnn layer paramerters
-
-```critic_l1``` (```int```) critic network layer 1 parameters
-
-```critic_l2``` (```int```) critic network layer 2 parameters
-
-```tau```  (```float```)  target network learning rate
-
-
+- `time_steps` (`int`) Number of time-steps for rnn (LSTM)
+- `actor_rnn` (`int`) actor network rnn layer paramerters
+- `actor_l1` (`int`) actor network layer 1 parameters
+- `actor_l2` (`int`) actor network layer 2 parameters
+- `critic_rnn` (`int`) critic network rnn layer paramerters
+- `critic_l1` (`int`) critic network layer 1 parameters
+- `critic_l2` (`int`) critic network layer 2 parameters
+- `tau` (`float`) target network learning rate
 
 # Dependencies
 
 Written in TensorFlow 2.0 (Keras)
 
 Requires the following PiPy packages
+
 ```
 import matplotlib.pyplot as plt
 import numpy as np