-
Notifications
You must be signed in to change notification settings - Fork 293
Description
Problem
Thank you very much for the imitation learning library, recently I'm using the related code in combination with SB3 to solve some problems, which is really helpful for me.
Recently, when I was implementing the DAgger algorithm, I found that the most critical function in the implementation recommended in Doc, expert_policy in ”SimpleDAggerTrainer“, does not support callable ”expert policy“ objects. But this is supported in Behavior cloning (in fact, I use it in this way). After a careful search, I found that the reason is that the code of ”SimpleDAggerTrainer“ forces "deterministic_policy = True",
trajectories = rollout.generate_trajectories(
policy=self.expert_policy,
venv=collector,
sample_until=sample_until,
deterministic_policy=True,
rng=collector.rng,
)
which leads to an error in "policy_to_callable" in "rollout.py":
"Cannot set deterministic_policy=True when policy is a callable, "
"since deterministic_policy argument is ignored.",
Theoretically speaking, here the expert_policy can be a callable object, as long as it can output the corresponding action for a certain observation. And I still want the Dagger algorithm to incorporate the expert policy defined by myself (instead of the policy inherited from BasePolicy in SB3). So I would like to ask if there is a convenient implementation solution. Or do you have a better advice?