From d62c50ab97bc8c9a85c451cfd07a1a15c0a70e17 Mon Sep 17 00:00:00 2001 From: Nikolay Chechulin Date: Fri, 15 Jan 2021 23:33:25 +0300 Subject: [PATCH 1/2] format document --- README.md | 65 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 33 insertions(+), 32 deletions(-) diff --git a/README.md b/README.md index 39dbe03..69cb0f4 100644 --- a/README.md +++ b/README.md @@ -3,82 +3,83 @@ This contains basic tools for implementing Reinforcement Learning algorithms and gym environments. Mainly aiming for systems with continious state space and action space. ## gym environments: + - [DC-DC buck converter](rl/gym_env/buck.py) - [DC-DC boost converter](rl/gym_env/boost.py) - [four node buck (DC) microgrid](rl/gym_env/buck_microgrid.py) + ## RL algorithms -- ```buck_ddpg``` run DDPG on a simple buck converter environment. -![DC-DC buck converter](results/results_plot_nice.png) + +- `buck_ddpg` run DDPG on a simple buck converter environment. + ![DC-DC buck converter](results/results_plot_nice.png) # How to use? -```python buck_ddpg --gamma=0.9 --max_episodes=100 --actor_lr=0.0001 --critic_lr=0.01 summary_dir='./results_buck_ddps'``` + +`python buck_ddpg --gamma=0.9 --max_episodes=100 --actor_lr=0.0001 --critic_lr=0.01 summary_dir='./results_buck_ddps'` will run the ddpg algorithm on buck converter, with discount factor = 0.9, for 100 episodes, and actor and critic learning rates 0.0001, 0.01, respectively. Finally saves the results in the folder = './results_buck_ddps' (the folder should be available) # Complete argument list: Use argparse to set the parameters of the desired experiment. Running buck_ddpg.py as a script will then output the results to a named and dated directory in the results folder. -```summary_dir``` folder path to load and save the model. Saved all the results in .mat format. +`summary_dir` folder path to load and save the model. Saved all the results in .mat format. -```save_model``` (```bool```) if ```True``` saves the model in the ```summary_dir``` +`save_model` (`bool`) if `True` saves the model in the `summary_dir` -```load_model``` (```bool```) if ```True``` loads the model in the ```summary_dir``` +`load_model` (`bool`) if `True` loads the model in the `summary_dir` -```random_seed``` (```int```) seeding the random number generator (NOT completely implemented) +`random_seed` (`int`) seeding the random number generator (NOT completely implemented) -```buffer_size``` (```int```) replay buffer size +`buffer_size` (`int`) replay buffer size -```max_episodes``` (```int```) max number of episodes for training +`max_episodes` (`int`) max number of episodes for training -```max_episode_len``` (```int```) Number of steps per epsiode +`max_episode_len` (`int`) Number of steps per epsiode -```mini_batch_size``` (```int```) sampling batch size drawn from replay buffer +`mini_batch_size` (`int`) sampling batch size drawn from replay buffer -```actor_lr``` (```float```) actor network learning rate +`actor_lr` (`float`) actor network learning rate -```critic_lr``` (```float```) critic network learning rate +`critic_lr` (`float`) critic network learning rate -```gamma``` (```float```) models the long term returns (discount factor) +`gamma` (`float`) models the long term returns (discount factor) -```noise_var``` (```float```) starting variance of the exploration noise at each episode, and decreased as the episode progress +`noise_var` (`float`) starting variance of the exploration noise at each episode, and decreased as the episode progress -```scaling``` (```bool```) If ```True``` scales the states before using for training +`scaling` (`bool`) If `True` scales the states before using for training -```state_dim``` (```int```) state dimension of environment +`state_dim` (`int`) state dimension of environment -```action_dim``` (```int```) action space dimension +`action_dim` (`int`) action space dimension -```action_bound``` (```float```) upper and lower bound of the actions +`action_bound` (`float`) upper and lower bound of the actions -```discretization_time``` (```float```) discretization time used for the environment +`discretization_time` (`float`) discretization time used for the environment ### Actor and Critic network is implemented using LSTM's + two hidden layers -```time_steps``` (```int```) Number of time-steps for rnn (LSTM) - -```actor_rnn``` (```int```) actor network rnn layer paramerters +`time_steps` (`int`) Number of time-steps for rnn (LSTM) -```actor_l1``` (```int```) actor network layer 1 parameters +`actor_rnn` (`int`) actor network rnn layer paramerters -```actor_l2``` (```int```) actor network layer 2 parameters +`actor_l1` (`int`) actor network layer 1 parameters +`actor_l2` (`int`) actor network layer 2 parameters +`critic_rnn` (`int`) critic network rnn layer paramerters -```critic_rnn``` (```int```) critic network rnn layer paramerters - -```critic_l1``` (```int```) critic network layer 1 parameters - -```critic_l2``` (```int```) critic network layer 2 parameters - -```tau``` (```float```) target network learning rate +`critic_l1` (`int`) critic network layer 1 parameters +`critic_l2` (`int`) critic network layer 2 parameters +`tau` (`float`) target network learning rate # Dependencies Written in TensorFlow 2.0 (Keras) Requires the following PiPy packages + ``` import matplotlib.pyplot as plt import numpy as np From 41817ded17126442384e230eb1683b79a9952dfd Mon Sep 17 00:00:00 2001 From: Nikolay Chechulin Date: Fri, 15 Jan 2021 23:35:38 +0300 Subject: [PATCH 2/2] format argument lists --- README.md | 73 +++++++++++++++++++------------------------------------ 1 file changed, 25 insertions(+), 48 deletions(-) diff --git a/README.md b/README.md index 69cb0f4..c91a9d6 100644 --- a/README.md +++ b/README.md @@ -22,57 +22,34 @@ will run the ddpg algorithm on buck converter, with discount factor = 0.9, for 1 Use argparse to set the parameters of the desired experiment. Running buck_ddpg.py as a script will then output the results to a named and dated directory in the results folder. -`summary_dir` folder path to load and save the model. Saved all the results in .mat format. - -`save_model` (`bool`) if `True` saves the model in the `summary_dir` - -`load_model` (`bool`) if `True` loads the model in the `summary_dir` - -`random_seed` (`int`) seeding the random number generator (NOT completely implemented) - -`buffer_size` (`int`) replay buffer size - -`max_episodes` (`int`) max number of episodes for training - -`max_episode_len` (`int`) Number of steps per epsiode - -`mini_batch_size` (`int`) sampling batch size drawn from replay buffer - -`actor_lr` (`float`) actor network learning rate - -`critic_lr` (`float`) critic network learning rate - -`gamma` (`float`) models the long term returns (discount factor) - -`noise_var` (`float`) starting variance of the exploration noise at each episode, and decreased as the episode progress - -`scaling` (`bool`) If `True` scales the states before using for training - -`state_dim` (`int`) state dimension of environment - -`action_dim` (`int`) action space dimension - -`action_bound` (`float`) upper and lower bound of the actions - -`discretization_time` (`float`) discretization time used for the environment +- `summary_dir` folder path to load and save the model. Saved all the results in .mat format. +- `save_model` (`bool`) if `True` saves the model in the `summary_dir` +- `load_model` (`bool`) if `True` loads the model in the `summary_dir` +- `random_seed` (`int`) seeding the random number generator (NOT completely implemented) +- `buffer_size` (`int`) replay buffer size +- `max_episodes` (`int`) max number of episodes for training +- `max_episode_len` (`int`) Number of steps per epsiode +- `mini_batch_size` (`int`) sampling batch size drawn from replay buffer +- `actor_lr` (`float`) actor network learning rate +- `critic_lr` (`float`) critic network learning rate +- `gamma` (`float`) models the long term returns (discount factor) +- `noise_var` (`float`) starting variance of the exploration noise at each episode, and decreased as the episode progress +- `scaling` (`bool`) If `True` scales the states before using for training +- `state_dim` (`int`) state dimension of environment +- `action_dim` (`int`) action space dimension +- `action_bound` (`float`) upper and lower bound of the actions +- `discretization_time` (`float`) discretization time used for the environment ### Actor and Critic network is implemented using LSTM's + two hidden layers -`time_steps` (`int`) Number of time-steps for rnn (LSTM) - -`actor_rnn` (`int`) actor network rnn layer paramerters - -`actor_l1` (`int`) actor network layer 1 parameters - -`actor_l2` (`int`) actor network layer 2 parameters - -`critic_rnn` (`int`) critic network rnn layer paramerters - -`critic_l1` (`int`) critic network layer 1 parameters - -`critic_l2` (`int`) critic network layer 2 parameters - -`tau` (`float`) target network learning rate +- `time_steps` (`int`) Number of time-steps for rnn (LSTM) +- `actor_rnn` (`int`) actor network rnn layer paramerters +- `actor_l1` (`int`) actor network layer 1 parameters +- `actor_l2` (`int`) actor network layer 2 parameters +- `critic_rnn` (`int`) critic network rnn layer paramerters +- `critic_l1` (`int`) critic network layer 1 parameters +- `critic_l2` (`int`) critic network layer 2 parameters +- `tau` (`float`) target network learning rate # Dependencies