Comments for the DoctoR #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

rscircus wants to merge 49 commits into rscircus:master from steffi7574:commentsfortheDoctoR

.gitignore

-Original file line number
+Diff line change
@@ -1,7 +1,13 @@
+    # cpp
     build/
     *out*
     *.o
     *.pyc
     *run_*
     *optim.dat
     main
+    # Ignore IDE configurations
+    .idea
+    .vscode
+    tags

CHANGELOG.md

-Original file line number
+Diff line change
@@ -0,0 +1,17 @@
+    # Changelog
+    All notable high-level changes to this project will be documented in this file.
+    The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+    and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+    ## [Unreleased]
+    ## [1.0.1] - 2019.07.09
+    ### Added
+    - This CHANGELOG file.
+    - Format project using `clang-project`
+    - README contains WIP elements, which will be eliminated after completion
+    ## [1.0.0] - 2019.07.09
+    ### Added
+    - This project as of state of the following publication: TODO

Readme.md → README.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,19 +1,26 @@
  
    # Layer-parallel training of deep residual neural networks 

    # Layer-parallel training of deep residual neural networks

    This code performs layer-parallel training of deep neural networks of residual type. It utilizes the parallel-in-time software library [XBraid](https://github.com/XBraid/xbraid) to distribute layers of the network to different compute units. Instead of sequential forward and backward propagation through the network, iterative multigrid udpates are performed in parallel to solve for the network propagation and the training simultaneously. See the paper [Guenther et al.](https://arxiv.org/pdf/1812.04352.pdf) for a describtion of the method and all details.

    ## Build

    The repository includes XBraid as a submodule. To clone both, use either `git clone --recurse-submodules [...]` for Git version >= 2.13, or `git clone [...]` followed by `cd xbraid`, `git submodule init` and `git submodule update` for older Git versions. 

    The repository includes XBraid as a submodule. To clone both, use either `git clone --recurse-submodules [...]` for Git version \>= 2.13, or `git clone [...]` followed by `cd xbraid`, `git submodule init` and `git submodule update` for older Git versions.

    Type `make` in the main directory to build both the code and the XBraid library. 

    Type `make` in the main directory to build both the code and the XBraid library.

    ## Run

    Test cases are located in the 'examples/' subfolder. Each example contains a `*.cfg` that holds configuration options for the current example dataset, the layer-parallelization with XBraid, and the optimization method and parameters. 

    Test cases are located in the 'examples/' subfolder. Each example contains a `*.cfg` that holds configuration options for the current example dataset, the layer-parallelization with XBraid, and the optimization method and parameters.

    Run the test cases by callying './main' with the corresponding configuration file, e.g. `./main examples/peaks/peaks.cfg`

    Run the test cases by callying './main' with the corresponding configuration file, e.g. `./main examples/peaks/peaks.cfg`

    ## Output

    An optimization history file 'optim.dat' will be flushed to the examples subfolder. 

    An optimization history file 'optim.dat' will be flushed to the examples subfolder.

    ## Contributors

    * Stefanie Guenther <guenther5@llnl.gov>

    * Eric C. Cyr <eccyr@sandia.gov>

    * J.B. Schroder <jbschroder@unm.edu>

    * Roland A. Siegbert <roland.siegbert@rwth-aachen.de>

examples/peaks/peaks.cfg

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -3,7 +3,7 @@
  
    ################################

    # relative data folder location 

    datafolder = examples/peaks

    datafolder = ./

    # filename of training data feature vectors

    ftrain_ex = features_training.dat

    # filename of training data labels/classes

    @@ -35,7 +35,7 @@ nchannels = 8
  
    # number of layers (including opening layer and classification layer) (nlayer >= 3 !)

    nlayers = 32    

    # final time

    T = 5.0

    T = 1.0

    # Activation function ("tanh" or "ReLu" or "SmoothReLu")

    activation = SmoothReLu

    # Type of network ("dense" the default, or "convolutional")

    @@ -47,7 +47,7 @@ type_openlayer = activate
  
    # factor for scaling initial opening layer weights and bias

    weights_open_init = 1e-3

    # factor for scaling initial weights and bias of intermediate layers

    weights_init = 0e-3

    weights_init = 1e-3

    # factor for scaling initial classification weights and bias 

    weights_class_init = 1e-3

    @@ -66,7 +66,7 @@ braid_maxlevels = 10
  
    # minimum allowed coarse time time grid size (values in 10-30 are usually best)

    braid_mincoarse = 10

    # maximum number of iterations

    braid_maxiter = 15

    braid_maxiter = 2

    # absolute tolerance

    braid_abstol = 1e-15

    # absolute adjoint tolerance

    @@ -88,13 +88,19 @@ braid_nrelax0 = 0
  
    # Optimization

    ####################################

    # Type of batch selection ("deterministic" or "stochastic")

    # deterministic:

    # fixes batch size => trains on this one

    #

    # stochastic: uses some (dead) pool

    # batch elements are randomly chosen in each iteration during training

    # smaller batch size makes sense

    batch_type = deterministic

    # Batch size

    nbatch = 5000

    # relaxation param for tikhonov term

    gamma_tik = 1e-7

    # relaxation param for time-derivative term

    gamma_ddt = 1e-7

    gamma_ddt = 1e-5

    # relaxation param for tikhonov term of classification weights 

    gamma_class = 1e-7

    # stepsize selection type ("fixed" or "backtrackingLS" or "oneoverk")

    @@ -106,19 +112,19 @@ stepsize_type = backtrackingLS
  
    # initial stepsize

    stepsize = 1.0

    # maximum number of optimization iterations

    optim_maxiter = 10

    optim_maxiter = 130

    # absolute stopping criterion for the gradient norm

    gtol = 1e-4

    # maximum number of linesearch iterations

    ls_maxiter = 20

    ls_maxiter = 15

    # factor for modifying the stepsize within a linesearch iteration

    ls_factor = 0.5

    # Hessian Approximation ("BFGS", "L-BFGS" or "Identity")

    hessian_approx = L-BFGS

    # number of stages for l-bfgs method 

    lbfgs_stages = 20

    lbfgs_stages = 10

    # level for validation computation: 

    #  -1 = never validate

    #   0 = validate only after optimization finishes. 

    #   1 = validate in each optimization iteration

    validationlevel = 0

    validationlevel = 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments for the DoctoR #13

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Comments for the DoctoR #13

Are you sure you want to change the base?

Uh oh!

Comments for the DoctoR #13

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!