Skip to content

Run Fluid with EDL #35

@typhoonzero

Description

@typhoonzero

Tasks

  • full fault-tolerant training
  • dynamic trainer count in the pserver side so that we will be able to average gradients according to current trainer count.
  • Upgrade EDL controller to CRD so that we can support Kubernetes higher than v1.8
  • a tutorial to run distributed lookup sparse table with EDL
  • update experiment report, https://github.com/PaddlePaddle/cloud/tree/develop/doc/edl/experiment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions