Skip to content

A suggestion for ”SimpleDAggerTrainer“ function #693

@wlxer

Description

@wlxer

Problem

Thank you very much for the imitation learning library, recently I'm using the related code in combination with SB3 to solve some problems, which is really helpful for me.

Recently, when I was implementing the DAgger algorithm, I found that the most critical function in the implementation recommended in Doc, expert_policy in ”SimpleDAggerTrainer“, does not support callable ”expert policy“ objects. But this is supported in Behavior cloning (in fact, I use it in this way). After a careful search, I found that the reason is that the code of ”SimpleDAggerTrainer“ forces "deterministic_policy = True",

trajectories = rollout.generate_trajectories(
                policy=self.expert_policy,
                venv=collector,
                sample_until=sample_until,
                deterministic_policy=True,
                rng=collector.rng,
            )

which leads to an error in "policy_to_callable" in "rollout.py":

"Cannot set deterministic_policy=True when policy is a callable, "
"since deterministic_policy argument is ignored.",

Theoretically speaking, here the expert_policy can be a callable object, as long as it can output the corresponding action for a certain observation. And I still want the Dagger algorithm to incorporate the expert policy defined by myself (instead of the policy inherited from BasePolicy in SB3). So I would like to ask if there is a convenient implementation solution. Or do you have a better advice?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions