From 838a94d524dcecade303fc9c4396e688545f591c Mon Sep 17 00:00:00 2001 From: Justyna Ilczuk Date: Tue, 8 Jul 2025 12:23:07 +0200 Subject: [PATCH] Fix typos in docs --- docs/main-concepts/reward_networks.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/main-concepts/reward_networks.rst b/docs/main-concepts/reward_networks.rst index 598041adc..e1ca7007c 100644 --- a/docs/main-concepts/reward_networks.rst +++ b/docs/main-concepts/reward_networks.rst @@ -73,7 +73,7 @@ There are two types of wrapper: * :class:`PredictProcessedWrapper ` modifies the predict_processed call to the reward network. Thus this type of reward network wrapper is designed to only modify the reward when it is being used to train/evaluate a policy but *not* when we are taking gradients on it. Thus it does not have to be differentiable. -The most commonly used is the :class:`NormalizedRewardNet ` which is a predict procssed wrapper. This class uses a normalization layer to standardize the *output* of the reward function using its running mean and variance, which is useful for stabilizing training. When a reward network is saved, its wrappers are saved along with it, so that the normalization fit during reward learning can be used during future policy learning or evaluation. +The most commonly used is the :class:`NormalizedRewardNet ` which is a predict processed wrapper. This class uses a normalization layer to standardize the *output* of the reward function using its running mean and variance, which is useful for stabilizing training. When a reward network is saved, its wrappers are saved along with it, so that the normalization fit during reward learning can be used during future policy learning or evaluation. .. testcode:: :skipif: skip_doctests @@ -86,7 +86,7 @@ The most commonly used is the :class:`NormalizedRewardNet `_ environment wrapper. First, it does not normalize the observations. Second, unlike ``VecNormalize``, it scales and centers the reward using the base rewards's mean and variance. The ``VecNormalizes`` scales the reward down using a running estimate of the _return_. + The reward normalization wrapper does _not_ function identically to stable baselines3's `VecNormalize `_ environment wrapper. First, it does not normalize the observations. Second, unlike ``VecNormalize``, it scales and centers the reward using the base rewards's mean and variance. The ``VecNormalize`` scales the reward down using a running estimate of the _return_. By default, the normalization wrapper updates the normalization on each call to ``predict_processed``. This behavior can be altered as shown below.