Add support to warm-start GAIL and AIRL using behavioral cloning

## Problem
GAIL and AIRL can be slow to train due to RL in the inner loop, which is both computationally expensive and can require many environment interactions. Behavioral cloning is supervised learning so is fast and does not need any environment interactions, however its peak performance in complex environments is often weaker than GAIL and AIRL.

## Solution
Add an option to use BC to train a policy from the demonstrations, and then to fine-tune that policy using GAIL/AIRL with the same set of demonstrations. This option was supported in the [Stable Baselines v2](https://stable-baselines.readthedocs.io/en/master/modules/gail.html) implementation of GAIL, for example.

This may not always help. If BC learns a bad policy, we could get stuck in a local minima. For AIRL the resulting reward function might also be more fragile as the transition distribution seen during training will be more limited (for GAIL the discriminator could be more fragile as well but it was never intended to be reused).

We already have a BC implementation, so I think this should probably be a feature added to the `train_imitation` script rather than the algorithm itself, although if it ends up being an involved implementation adding a helper method could be appropriate so that people directly using the Python API can also benefit from this.

## Possible alternative solutions
We could potentially make this supported more widely, even for algorithms that don't learn from demonstrations like the `preference_comparisons` module, in case users have both demonstrations and preference comparisons and want to learn from both (a setting that has been studied in e.g. Ibarz et al, 2018). I don't think it's worth us going out of our way to support this, but if it factors out nicely (e.g. some extra Sacred ingredient to warm-start the policy) it could be worth adding.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support to warm-start GAIL and AIRL using behavioral cloning #587

Problem

Solution

Possible alternative solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support to warm-start GAIL and AIRL using behavioral cloning #587

Description

Problem

Solution

Possible alternative solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions