A suggestion for ”SimpleDAggerTrainer“ function

## Problem
Thank you very much for the imitation learning library, recently I'm using the related code in combination with SB3 to solve some problems, which is really helpful for me.

Recently, when I was implementing the DAgger algorithm, I found that the most critical function in the implementation recommended in Doc, expert_policy in ”**SimpleDAggerTrainer**“, does not support callable ”**expert policy**“ objects. But this is supported in Behavior cloning (in fact, I use it in this way). After a careful search, I found that the reason is that the code of ”**SimpleDAggerTrainer**“ forces "**deterministic_policy = True**", 

```
trajectories = rollout.generate_trajectories(
                policy=self.expert_policy,
                venv=collector,
                sample_until=sample_until,
                deterministic_policy=True,
                rng=collector.rng,
            )
```



which leads to an error in "**policy_to_callable**" in "**rollout**.py":

> "Cannot set deterministic_policy=True when policy is a callable, " 
"since deterministic_policy argument is ignored.",

Theoretically speaking, here the **expert_policy** can be a callable object, as long as it can output the corresponding action for a certain observation.  And I still want the Dagger algorithm to incorporate the expert policy defined by myself (instead of the policy inherited from BasePolicy in SB3). So I would like to ask if there is a convenient implementation solution. Or do you have a better advice?




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A suggestion for ”SimpleDAggerTrainer“ function #693

Problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A suggestion for ”SimpleDAggerTrainer“ function #693

Description

Problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions