Skip to content

Clippping

This subpackage contains modules like gradient clipping, normalization, centralization, etc.

Classes:

  • Centralize

    Centralizes the update.

  • ClipNorm

    Clips update norm to be no larger than value.

  • ClipNormByEMA

    Clips norm to be no larger than the norm of an exponential moving average of past updates.

  • ClipNormGrowth

    Clips update norm growth.

  • ClipValue

    Clips update magnitude to be within (-value, value) range.

  • ClipValueByEMA

    Clips magnitude of update to be no larger than magnitude of exponential moving average of past (unclipped) updates.

  • ClipValueGrowth

    Clips update value magnitude growth.

  • Normalize

    Normalizes the update.

  • NormalizeByEMA

    Sets norm of the update to be the same as the norm of an exponential moving average of past updates.

Functions:

  • clip_grad_norm_

    Clips gradient of an iterable of parameters to specified norm value.

  • clip_grad_value_

    Clips gradient of an iterable of parameters at specified value.

  • normalize_grads_

    Normalizes gradient of an iterable of parameters to specified norm value.

Centralize

Bases: torchzero.core.transform.Transform

Centralizes the update.

Parameters:

  • dim (int | Sequence[int] | str | None, default: None ) –

    calculates norm along those dimensions. If list/tuple, tensors are centralized along all dimensios in dim that they have. Can be set to "global" to centralize by global mean of all gradients concatenated to a vector. Defaults to None.

  • inverse_dims (bool, default: False ) –

    if True, the dims argument is inverted, and all other dimensions are centralized.

  • min_size (int, default: 2 ) –

    minimal size of a dimension to normalize along it. Defaults to 1.

Examples:

Standard gradient centralization:

opt = tz.Modular(
    model.parameters(),
    tz.m.Centralize(dim=0),
    tz.m.LR(1e-2),
)

References: - Yong, H., Huang, J., Hua, X., & Zhang, L. (2020). Gradient centralization: A new optimization technique for deep neural networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 635-652). Springer International Publishing. https://arxiv.org/abs/2004.01461

Source code in torchzero/modules/clipping/clipping.py
class Centralize(Transform):
    """Centralizes the update.

    Args:
        dim (int | Sequence[int] | str | None, optional):
            calculates norm along those dimensions.
            If list/tuple, tensors are centralized along all dimensios in `dim` that they have.
            Can be set to "global" to centralize by global mean of all gradients concatenated to a vector.
            Defaults to None.
        inverse_dims (bool, optional):
            if True, the `dims` argument is inverted, and all other dimensions are centralized.
        min_size (int, optional):
            minimal size of a dimension to normalize along it. Defaults to 1.

    Examples:

    Standard gradient centralization:
    ```python
    opt = tz.Modular(
        model.parameters(),
        tz.m.Centralize(dim=0),
        tz.m.LR(1e-2),
    )
    ```

    References:
    - Yong, H., Huang, J., Hua, X., & Zhang, L. (2020). Gradient centralization: A new optimization technique for deep neural networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 635-652). Springer International Publishing. https://arxiv.org/abs/2004.01461
    """
    def __init__(
        self,
        dim: int | Sequence[int] | Literal["global"] | None = None,
        inverse_dims: bool = False,
        min_size: int = 2,
        target: Target = "update",
    ):
        defaults = dict(dim=dim,min_size=min_size,inverse_dims=inverse_dims)
        super().__init__(defaults, target=target)

    @torch.no_grad
    def apply_tensors(self, tensors, params, grads, loss, states, settings):
        dim, min_size, inverse_dims = itemgetter('dim', 'min_size', 'inverse_dims')(settings[0])

        _centralize_(tensors_ = TensorList(tensors), dim=dim, inverse_dims=inverse_dims, min_size=min_size)

        return tensors

ClipNorm

Bases: torchzero.core.transform.Transform

Clips update norm to be no larger than value.

Parameters:

  • max_norm (float) –

    value to clip norm to.

  • ord (float, default: 2 ) –

    norm order. Defaults to 2.

  • dim (int | Sequence[int] | str | None, default: None ) –

    calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in dim that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None.

  • inverse_dims (bool, default: False ) –

    if True, the dims argument is inverted, and all other dimensions are normalized.

  • min_size (int, default: 1 ) –

    minimal numer of elements in a parameter or slice to clip norm. Defaults to 1.

  • target (str, default: 'update' ) –

    what this affects.

Examples:

Gradient norm clipping:

opt = tz.Modular(
    model.parameters(),
    tz.m.ClipNorm(1),
    tz.m.Adam(),
    tz.m.LR(1e-2),
)

Update norm clipping:

opt = tz.Modular(
    model.parameters(),
    tz.m.Adam(),
    tz.m.ClipNorm(1),
    tz.m.LR(1e-2),
)

Source code in torchzero/modules/clipping/clipping.py
class ClipNorm(Transform):
    """Clips update norm to be no larger than `value`.

    Args:
        max_norm (float): value to clip norm to.
        ord (float, optional): norm order. Defaults to 2.
        dim (int | Sequence[int] | str | None, optional):
            calculates norm along those dimensions.
            If list/tuple, tensors are normalized along all dimensios in `dim` that they have.
            Can be set to "global" to normalize by global norm of all gradients concatenated to a vector.
            Defaults to None.
        inverse_dims (bool, optional):
            if True, the `dims` argument is inverted, and all other dimensions are normalized.
        min_size (int, optional):
            minimal numer of elements in a parameter or slice to clip norm. Defaults to 1.
        target (str, optional):
            what this affects.

    Examples:

    Gradient norm clipping:
    ```python
    opt = tz.Modular(
        model.parameters(),
        tz.m.ClipNorm(1),
        tz.m.Adam(),
        tz.m.LR(1e-2),
    )
    ```

    Update norm clipping:
    ```python
    opt = tz.Modular(
        model.parameters(),
        tz.m.Adam(),
        tz.m.ClipNorm(1),
        tz.m.LR(1e-2),
    )
    ```
    """
    def __init__(
        self,
        max_norm: float,
        ord: Metrics = 2,
        dim: int | Sequence[int] | Literal["global"] | None = None,
        inverse_dims: bool = False,
        min_size: int = 1,
        target: Target = "update",
    ):
        defaults = dict(max_norm=max_norm,ord=ord,dim=dim,min_size=min_size,inverse_dims=inverse_dims)
        super().__init__(defaults, target=target)

    @torch.no_grad
    def apply_tensors(self, tensors, params, grads, loss, states, settings):
        max_norm = NumberList(s['max_norm'] for s in settings)
        ord, dim, min_size, inverse_dims = itemgetter('ord', 'dim', 'min_size', 'inverse_dims')(settings[0])
        _clip_norm_(
            tensors_ = TensorList(tensors),
            min = 0,
            max = max_norm,
            norm_value = None,
            ord = ord,
            dim = dim,
            inverse_dims=inverse_dims,
            min_size = min_size,
        )
        return tensors

ClipNormByEMA

Bases: torchzero.core.transform.Transform

Clips norm to be no larger than the norm of an exponential moving average of past updates.

Parameters:

  • beta (float, default: 0.99 ) –

    beta for the exponential moving average. Defaults to 0.99.

  • ord (float, default: 2 ) –

    order of the norm. Defaults to 2.

  • eps (float, default: 1e-06 ) –

    epsilon for division. Defaults to 1e-6.

  • tensorwise (bool, default: True ) –

    if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.

  • max_ema_growth (float | None, default: 1.5 ) –

    if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.

  • ema_init (str, default: 'zeros' ) –

    How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.

Source code in torchzero/modules/clipping/ema_clipping.py
class ClipNormByEMA(Transform):
    """Clips norm to be no larger than the norm of an exponential moving average of past updates.

    Args:
        beta (float, optional): beta for the exponential moving average. Defaults to 0.99.
        ord (float, optional): order of the norm. Defaults to 2.
        eps (float, optional): epsilon for division. Defaults to 1e-6.
        tensorwise (bool, optional):
            if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
        max_ema_growth (float | None, optional):
            if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.
        ema_init (str, optional):
            How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
    """
    NORMALIZE = False
    def __init__(
        self,
        beta=0.99,
        ord: Metrics = 2,
        eps=1e-6,
        tensorwise:bool=True,
        max_ema_growth: float | None = 1.5,
        ema_init: Literal['zeros', 'update'] = 'zeros',
        inner: Chainable | None = None,
    ):
        defaults = dict(beta=beta, ord=ord, tensorwise=tensorwise, ema_init=ema_init, eps=eps, max_ema_growth=max_ema_growth)
        super().__init__(defaults, inner=inner)

    @torch.no_grad
    def update_tensors(self, tensors, params, grads, loss, states, settings):
        tensors = TensorList(tensors)
        ord, tensorwise, ema_init, max_ema_growth = itemgetter('ord', 'tensorwise', 'ema_init', 'max_ema_growth')(settings[0])

        beta, eps = unpack_dicts(settings, 'beta', 'eps', cls=NumberList)

        ema = unpack_states(states, tensors, 'ema', init = (torch.zeros_like if ema_init=='zeros' else tensors), cls=TensorList)

        ema.lerp_(tensors, 1-beta)

        if tensorwise:
            ema_norm = ema.metric(ord)

            # clip ema norm growth
            if max_ema_growth is not None:
                prev_ema_norm = unpack_states(states, tensors, 'prev_ema_norm', init=ema_norm, cls=TensorList)
                allowed_norm = (prev_ema_norm * max_ema_growth).clip(min=1e-6)
                ema_denom = (ema_norm / allowed_norm).clip(min=1)
                ema.div_(ema_denom)
                ema_norm.div_(ema_denom)
                prev_ema_norm.set_(ema_norm)

            tensors_norm = tensors.norm(ord)
            denom = tensors_norm / ema_norm.clip(min=eps)
            if self.NORMALIZE: denom.clip_(min=eps)
            else: denom.clip_(min=1)

        else:
            ema_norm = ema.global_metric(ord)

            # clip ema norm growth
            if max_ema_growth is not None:
                prev_ema_norm = self.global_state.setdefault('prev_ema_norm', ema_norm)
                allowed_norm = prev_ema_norm * max_ema_growth
                if ema_norm > allowed_norm:
                    ema.div_(ema_norm / allowed_norm)
                    ema_norm = allowed_norm
                prev_ema_norm.set_(ema_norm)

            tensors_norm = tensors.global_metric(ord)
            denom = tensors_norm / ema_norm.clip(min=eps[0])
            if self.NORMALIZE: denom.clip_(min=eps[0])
            else: denom.clip_(min=1)

        self.global_state['denom'] = denom

    @torch.no_grad
    def apply_tensors(self, tensors, params, grads, loss, states, settings):
        denom = self.global_state.pop('denom')
        torch._foreach_div_(tensors, denom)
        return tensors

NORMALIZE class-attribute

NORMALIZE = False

bool(x) -> bool

Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.

ClipNormGrowth

Bases: torchzero.core.transform.Transform

Clips update norm growth.

Parameters:

  • add (float | None, default: None ) –

    additive clipping, next update norm is at most previous norm + add. Defaults to None.

  • mul (float | None, default: 1.5 ) –

    multiplicative clipping, next update norm is at most previous norm * mul. Defaults to 1.5.

  • min_value (float | None, default: 0.0001 ) –

    minimum value for multiplicative clipping to prevent collapse to 0. Next norm is at most :code:max(prev_norm, min_value) * mul. Defaults to 1e-4.

  • max_decay (float | None, default: 2 ) –

    bounds the tracked multiplicative clipping decay to prevent collapse to 0. Next norm is at most :code:max(previous norm * mul, max_decay). Defaults to 2.

  • ord (float, default: 2 ) –

    norm order. Defaults to 2.

  • parameterwise (bool, default: True ) –

    if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.

  • target (Literal, default: 'update' ) –

    what to set on var. Defaults to "update".

Source code in torchzero/modules/clipping/growth_clipping.py
class ClipNormGrowth(Transform):
    """Clips update norm growth.

    Args:
        add (float | None, optional): additive clipping, next update norm is at most `previous norm + add`. Defaults to None.
        mul (float | None, optional):
            multiplicative clipping, next update norm is at most `previous norm * mul`. Defaults to 1.5.
        min_value (float | None, optional):
            minimum value for multiplicative clipping to prevent collapse to 0.
            Next norm is at most :code:`max(prev_norm, min_value) * mul`. Defaults to 1e-4.
        max_decay (float | None, optional):
            bounds the tracked multiplicative clipping decay to prevent collapse to 0.
            Next norm is at most :code:`max(previous norm * mul, max_decay)`.
            Defaults to 2.
        ord (float, optional): norm order. Defaults to 2.
        parameterwise (bool, optional):
            if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
        target (Target, optional): what to set on var. Defaults to "update".
    """
    def __init__(
        self,
        add: float | None = None,
        mul: float | None = 1.5,
        min_value: float | None = 1e-4,
        max_decay: float | None = 2,
        ord: float = 2,
        parameterwise=True,
        target: Target = "update",
    ):
        defaults = dict(add=add, mul=mul, min_value=min_value, max_decay=max_decay, ord=ord, parameterwise=parameterwise)
        super().__init__(defaults, target=target)



    def apply_tensors(self, tensors, params, grads, loss, states, settings):
        parameterwise = settings[0]['parameterwise']
        tensors = TensorList(tensors)

        if parameterwise:
            ts = tensors
            stts = states
            stns = settings

        else:
            ts = [tensors.to_vec()]
            stts = [self.global_state]
            stns = [settings[0]]


        for t, state, setting in zip(ts, stts, stns):
            if 'prev_norm' not in state:
                state['prev_norm'] = torch.linalg.vector_norm(t, ord=setting['ord']) # pylint:disable=not-callable
                state['prev_denom'] = 1
                continue

            _,  state['prev_norm'], state['prev_denom'] = norm_growth_clip_(
                tensor_ = t,
                prev_norm = state['prev_norm'],
                add = setting['add'],
                mul = setting['mul'],
                min_value = setting['min_value'],
                max_decay = setting['max_decay'],
                ord = setting['ord'],
            )

        if not parameterwise:
            tensors.from_vec_(ts[0])

        return tensors

ClipValue

Bases: torchzero.core.transform.Transform

Clips update magnitude to be within (-value, value) range.

Parameters:

  • value (float) –

    value to clip to.

  • target (str, default: 'update' ) –

    refer to target argument in documentation.

Examples:

Gradient clipping:

opt = tz.Modular(
    model.parameters(),
    tz.m.ClipValue(1),
    tz.m.Adam(),
    tz.m.LR(1e-2),
)

Update clipping:

opt = tz.Modular(
    model.parameters(),
    tz.m.Adam(),
    tz.m.ClipValue(1),
    tz.m.LR(1e-2),
)

Source code in torchzero/modules/clipping/clipping.py
class ClipValue(Transform):
    """Clips update magnitude to be within ``(-value, value)`` range.

    Args:
        value (float): value to clip to.
        target (str): refer to ``target argument`` in documentation.

    Examples:

    Gradient clipping:
    ```python
    opt = tz.Modular(
        model.parameters(),
        tz.m.ClipValue(1),
        tz.m.Adam(),
        tz.m.LR(1e-2),
    )
    ```

    Update clipping:
    ```python
    opt = tz.Modular(
        model.parameters(),
        tz.m.Adam(),
        tz.m.ClipValue(1),
        tz.m.LR(1e-2),
    )
    ```

    """
    def __init__(self, value: float, target: Target = 'update'):
        defaults = dict(value=value)
        super().__init__(defaults, target=target)

    @torch.no_grad
    def apply_tensors(self, tensors, params, grads, loss, states, settings):
        value = [s['value'] for s in settings]
        return TensorList(tensors).clip_([-v for v in value], value)

ClipValueByEMA

Bases: torchzero.core.transform.Transform

Clips magnitude of update to be no larger than magnitude of exponential moving average of past (unclipped) updates.

Parameters:

  • beta (float, default: 0.99 ) –

    beta for the exponential moving average. Defaults to 0.99.

  • ema_init (str, default: 'zeros' ) –

    How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.

  • ema_tfm (Chainable | None, default: None ) –

    optional modules applied to exponential moving average before clipping by it. Defaults to None.

Source code in torchzero/modules/clipping/ema_clipping.py
class ClipValueByEMA(Transform):
    """Clips magnitude of update to be no larger than magnitude of exponential moving average of past (unclipped) updates.

    Args:
        beta (float, optional): beta for the exponential moving average. Defaults to 0.99.
        ema_init (str, optional):
            How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
        ema_tfm (Chainable | None, optional):
            optional modules applied to exponential moving average before clipping by it. Defaults to None.
    """
    def __init__(
        self,
        beta=0.99,
        ema_init: Literal['zeros', 'update'] = 'zeros',
        ema_tfm:Chainable | None=None,
        inner: Chainable | None = None,
    ):
        defaults = dict(beta=beta, ema_init=ema_init)
        super().__init__(defaults, inner=inner)

        if ema_tfm is not None:
            self.set_child('ema_tfm', ema_tfm)

    @torch.no_grad
    def update_tensors(self, tensors, params, grads, loss, states, settings):
        ema_init = itemgetter('ema_init')(settings[0])

        beta = unpack_dicts(settings, 'beta', cls=NumberList)
        tensors = TensorList(tensors)

        ema = unpack_states(states, tensors, 'ema', init = (torch.zeros_like if ema_init=='zeros' else lambda t: t.abs()), cls=TensorList)
        ema.lerp_(tensors.abs(), 1-beta)

    def apply_tensors(self, tensors, params, grads, loss, states, settings):
        tensors = TensorList(tensors)
        ema = unpack_states(states, tensors, 'ema', cls=TensorList)

        if 'ema_tfm' in self.children:
            ema = TensorList(apply_transform(self.children['ema_tfm'], ema.clone(), params, grads, loss))

        tensors.clip_(-ema, ema)
        return tensors

ClipValueGrowth

Bases: torchzero.core.transform.TensorwiseTransform

Clips update value magnitude growth.

Parameters:

  • add (float | None, default: None ) –

    additive clipping, next update is at most previous update + add. Defaults to None.

  • mul (float | None, default: 1.5 ) –

    multiplicative clipping, next update is at most previous update * mul. Defaults to 1.5.

  • min_value (float | None, default: 0.0001 ) –

    minimum value for multiplicative clipping to prevent collapse to 0. Next update is at most :code:max(prev_update, min_value) * mul. Defaults to 1e-4.

  • max_decay (float | None, default: 2 ) –

    bounds the tracked multiplicative clipping decay to prevent collapse to 0. Next update is at most :code:max(previous update * mul, max_decay). Defaults to 2.

  • target (Literal, default: 'update' ) –

    what to set on var. Defaults to "update".

Source code in torchzero/modules/clipping/growth_clipping.py
class ClipValueGrowth(TensorwiseTransform):
    """Clips update value magnitude growth.

    Args:
        add (float | None, optional): additive clipping, next update is at most `previous update + add`. Defaults to None.
        mul (float | None, optional): multiplicative clipping, next update is at most `previous update * mul`. Defaults to 1.5.
        min_value (float | None, optional):
            minimum value for multiplicative clipping to prevent collapse to 0.
            Next update is at most :code:`max(prev_update, min_value) * mul`. Defaults to 1e-4.
        max_decay (float | None, optional):
            bounds the tracked multiplicative clipping decay to prevent collapse to 0.
            Next update is at most :code:`max(previous update * mul, max_decay)`.
            Defaults to 2.
        target (Target, optional): what to set on var. Defaults to "update".
    """
    def __init__(
        self,
        add: float | None = None,
        mul: float | None = 1.5,
        min_value: float | None = 1e-4,
        max_decay: float | None = 2,
        target: Target = "update",
    ):
        defaults = dict(add=add, mul=mul, min_value=min_value, max_decay=max_decay)
        super().__init__(defaults, target=target)


    def apply_tensor(self, tensor, param, grad, loss, state, setting):
        add, mul, min_value, max_decay = itemgetter('add','mul','min_value','max_decay')(setting)
        add: float | None

        if add is None and mul is None:
            return tensor

        if 'prev' not in state:
            state['prev'] = tensor.clone()
            return tensor

        prev: torch.Tensor = state['prev']

        # additive bound
        if add is not None:
            growth = (tensor.abs() - prev.abs()).clip(min=0)
            tensor.sub_(torch.where(growth > add, (growth-add).copysign_(tensor), 0))

        # multiplicative bound
        growth = None
        if mul is not None:
            prev_magn = prev.abs()
            if min_value is not None: prev_magn.clip_(min=min_value)
            growth = (tensor.abs() / prev_magn).clamp_(min=1e-8)

            denom = torch.where(growth > mul, growth/mul, 1)

            tensor.div_(denom)

        # limit max growth decay
        if max_decay is not None:
            if growth is None:
                prev_magn = prev.abs()
                if min_value is not None: prev_magn.clip_(min=min_value)
                growth = (tensor.abs() / prev_magn).clamp_(min=1e-8)

            new_prev = torch.where(growth < (1/max_decay), prev/max_decay, tensor)
        else:
            new_prev = tensor.clone()

        state['prev'] = new_prev
        return tensor

Normalize

Bases: torchzero.core.transform.Transform

Normalizes the update.

Parameters:

  • norm_value (float, default: 1 ) –

    desired norm value.

  • ord (float, default: 2 ) –

    norm order. Defaults to 2.

  • dim (int | Sequence[int] | str | None, default: None ) –

    calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in dim that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None.

  • inverse_dims (bool, default: False ) –

    if True, the dims argument is inverted, and all other dimensions are normalized.

  • min_size (int, default: 1 ) –

    minimal size of a dimension to normalize along it. Defaults to 1.

  • target (str, default: 'update' ) –

    what this affects.

Examples: Gradient normalization:

opt = tz.Modular(
    model.parameters(),
    tz.m.Normalize(1),
    tz.m.Adam(),
    tz.m.LR(1e-2),
)

Update normalization:

opt = tz.Modular(
    model.parameters(),
    tz.m.Adam(),
    tz.m.Normalize(1),
    tz.m.LR(1e-2),
)
Source code in torchzero/modules/clipping/clipping.py
class Normalize(Transform):
    """Normalizes the update.

    Args:
        norm_value (float): desired norm value.
        ord (float, optional): norm order. Defaults to 2.
        dim (int | Sequence[int] | str | None, optional):
            calculates norm along those dimensions.
            If list/tuple, tensors are normalized along all dimensios in `dim` that they have.
            Can be set to "global" to normalize by global norm of all gradients concatenated to a vector.
            Defaults to None.
        inverse_dims (bool, optional):
            if True, the `dims` argument is inverted, and all other dimensions are normalized.
        min_size (int, optional):
            minimal size of a dimension to normalize along it. Defaults to 1.
        target (str, optional):
            what this affects.

    Examples:
    Gradient normalization:
    ```python
    opt = tz.Modular(
        model.parameters(),
        tz.m.Normalize(1),
        tz.m.Adam(),
        tz.m.LR(1e-2),
    )
    ```

    Update normalization:

    ```python
    opt = tz.Modular(
        model.parameters(),
        tz.m.Adam(),
        tz.m.Normalize(1),
        tz.m.LR(1e-2),
    )
    ```
    """
    def __init__(
        self,
        norm_value: float = 1,
        ord: Metrics = 2,
        dim: int | Sequence[int] | Literal["global"] | None = None,
        inverse_dims: bool = False,
        min_size: int = 1,
        target: Target = "update",
    ):
        defaults = dict(norm_value=norm_value,ord=ord,dim=dim,min_size=min_size, inverse_dims=inverse_dims)
        super().__init__(defaults, target=target)

    @torch.no_grad
    def apply_tensors(self, tensors, params, grads, loss, states, settings):
        norm_value = NumberList(s['norm_value'] for s in settings)
        ord, dim, min_size, inverse_dims = itemgetter('ord', 'dim', 'min_size', 'inverse_dims')(settings[0])

        _clip_norm_(
            tensors_ = TensorList(tensors),
            min = None,
            max = None,
            norm_value = norm_value,
            ord = ord,
            dim = dim,
            inverse_dims=inverse_dims,
            min_size = min_size,
        )

        return tensors

NormalizeByEMA

Bases: torchzero.modules.clipping.ema_clipping.ClipNormByEMA

Sets norm of the update to be the same as the norm of an exponential moving average of past updates.

Parameters:

  • beta (float, default: 0.99 ) –

    beta for the exponential moving average. Defaults to 0.99.

  • ord (float, default: 2 ) –

    order of the norm. Defaults to 2.

  • eps (float, default: 1e-06 ) –

    epsilon for division. Defaults to 1e-6.

  • tensorwise (bool, default: True ) –

    if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.

  • max_ema_growth (float | None, default: 1.5 ) –

    if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.

  • ema_init (str, default: 'zeros' ) –

    How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.

Source code in torchzero/modules/clipping/ema_clipping.py
class NormalizeByEMA(ClipNormByEMA):
    """Sets norm of the update to be the same as the norm of an exponential moving average of past updates.

    Args:
        beta (float, optional): beta for the exponential moving average. Defaults to 0.99.
        ord (float, optional): order of the norm. Defaults to 2.
        eps (float, optional): epsilon for division. Defaults to 1e-6.
        tensorwise (bool, optional):
            if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
        max_ema_growth (float | None, optional):
            if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.
        ema_init (str, optional):
            How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
    """
    NORMALIZE = True

NORMALIZE class-attribute

NORMALIZE = True

bool(x) -> bool

Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.

clip_grad_norm_

clip_grad_norm_(params: Iterable[Tensor], max_norm: float | None, ord: Union[Literal['mad', 'std', 'var', 'sum', 'l0', 'l1', 'l2', 'l3', 'l4', 'linf'], float, Tensor] = 2, dim: Union[int, Sequence[int], Literal['global'], NoneType] = None, inverse_dims: bool = False, min_size: int = 2, min_norm: float | None = None)

Clips gradient of an iterable of parameters to specified norm value. Gradients are modified in-place.

Parameters:

  • params (Iterable[Tensor]) –

    parameters with gradients to clip.

  • max_norm (float) –

    value to clip norm to.

  • ord (float, default: 2 ) –

    norm order. Defaults to 2.

  • dim (int | Sequence[int] | str | None, default: None ) –

    calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in dim that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None.

  • min_size (int, default: 2 ) –

    minimal size of a dimension to normalize along it. Defaults to 1.

Source code in torchzero/modules/clipping/clipping.py
def clip_grad_norm_(
    params: Iterable[torch.Tensor],
    max_norm: float | None,
    ord: Metrics = 2,
    dim: int | Sequence[int] | Literal["global"] | None = None,
    inverse_dims: bool = False,
    min_size: int = 2,
    min_norm: float | None = None,
):
    """Clips gradient of an iterable of parameters to specified norm value.
    Gradients are modified in-place.

    Args:
        params (Iterable[torch.Tensor]): parameters with gradients to clip.
        max_norm (float): value to clip norm to.
        ord (float, optional): norm order. Defaults to 2.
        dim (int | Sequence[int] | str | None, optional):
            calculates norm along those dimensions.
            If list/tuple, tensors are normalized along all dimensios in `dim` that they have.
            Can be set to "global" to normalize by global norm of all gradients concatenated to a vector.
            Defaults to None.
        min_size (int, optional):
            minimal size of a dimension to normalize along it. Defaults to 1.
    """
    grads = TensorList(p.grad for p in params if p.grad is not None)
    _clip_norm_(grads, min=min_norm, max=max_norm, norm_value=None, ord=ord, dim=dim, inverse_dims=inverse_dims, min_size=min_size)

clip_grad_value_

clip_grad_value_(params: Iterable[Tensor], value: float)

Clips gradient of an iterable of parameters at specified value. Gradients are modified in-place. Args: params (Iterable[Tensor]): iterable of tensors with gradients to clip. value (float or int): maximum allowed value of gradient

Source code in torchzero/modules/clipping/clipping.py
def clip_grad_value_(params: Iterable[torch.Tensor], value: float):
    """Clips gradient of an iterable of parameters at specified value.
    Gradients are modified in-place.
    Args:
        params (Iterable[Tensor]): iterable of tensors with gradients to clip.
        value (float or int): maximum allowed value of gradient
    """
    grads = [p.grad for p in params if p.grad is not None]
    torch._foreach_clamp_min_(grads, -value)
    torch._foreach_clamp_max_(grads, value)

normalize_grads_

normalize_grads_(params: Iterable[Tensor], norm_value: float, ord: Union[Literal['mad', 'std', 'var', 'sum', 'l0', 'l1', 'l2', 'l3', 'l4', 'linf'], float, Tensor] = 2, dim: Union[int, Sequence[int], Literal['global'], NoneType] = None, inverse_dims: bool = False, min_size: int = 1)

Normalizes gradient of an iterable of parameters to specified norm value. Gradients are modified in-place.

Parameters:

  • params (Iterable[Tensor]) –

    parameters with gradients to clip.

  • norm_value (float) –

    value to clip norm to.

  • ord (float, default: 2 ) –

    norm order. Defaults to 2.

  • dim (int | Sequence[int] | str | None, default: None ) –

    calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in dim that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None.

  • inverse_dims (bool, default: False ) –

    if True, the dims argument is inverted, and all other dimensions are normalized.

  • min_size (int, default: 1 ) –

    minimal size of a dimension to normalize along it. Defaults to 1.

Source code in torchzero/modules/clipping/clipping.py
def normalize_grads_(
    params: Iterable[torch.Tensor],
    norm_value: float,
    ord: Metrics = 2,
    dim: int | Sequence[int] | Literal["global"] | None = None,
    inverse_dims: bool = False,
    min_size: int = 1,
):
    """Normalizes gradient of an iterable of parameters to specified norm value.
    Gradients are modified in-place.

    Args:
        params (Iterable[torch.Tensor]): parameters with gradients to clip.
        norm_value (float): value to clip norm to.
        ord (float, optional): norm order. Defaults to 2.
        dim (int | Sequence[int] | str | None, optional):
            calculates norm along those dimensions.
            If list/tuple, tensors are normalized along all dimensios in `dim` that they have.
            Can be set to "global" to normalize by global norm of all gradients concatenated to a vector.
            Defaults to None.
        inverse_dims (bool, optional):
            if True, the `dims` argument is inverted, and all other dimensions are normalized.
        min_size (int, optional):
            minimal size of a dimension to normalize along it. Defaults to 1.
    """
    grads = TensorList(p.grad for p in params if p.grad is not None)
    _clip_norm_(grads, min=None, max=None, norm_value=norm_value, ord=ord, dim=dim, inverse_dims=inverse_dims, min_size=min_size)