Clippping¶

This subpackage contains modules like gradient clipping, normalization, centralization, etc.

Classes:

Centralize –

Centralizes the update.
ClipNorm –

Clips update norm to be no larger than value.
ClipNormByEMA –

Clips norm to be no larger than the norm of an exponential moving average of past updates.
ClipNormGrowth –

Clips update norm growth.
ClipValue –

Clips update magnitude to be within (-value, value) range.
ClipValueByEMA –

Clips magnitude of update to be no larger than magnitude of exponential moving average of past (unclipped) updates.
ClipValueGrowth –

Clips update value magnitude growth.
Normalize –

Normalizes the update.
NormalizeByEMA –

Sets norm of the update to be the same as the norm of an exponential moving average of past updates.

Functions:

clip_grad_norm_ –

Clips gradient of an iterable of parameters to specified norm value.
clip_grad_value_ –

Clips gradient of an iterable of parameters at specified value.
normalize_grads_ –

Normalizes gradient of an iterable of parameters to specified norm value.

Centralize ¶

Bases: torchzero.core.transform.TensorTransform

Centralizes the update.

Parameters:

dim (int | Sequence[int] | str | None, default: None ) –

calculates norm along those dimensions. If list/tuple, tensors are centralized along all dimensios in dim that they have. Can be set to "global" to centralize by global mean of all gradients concatenated to a vector. Defaults to None.
inverse_dims (bool, default: False ) –

if True, the dims argument is inverted, and all other dimensions are centralized.
min_size (int, default: 2 ) –

minimal size of a dimension to normalize along it. Defaults to 1.

Examples:

Standard gradient centralization:

opt = tz.Optimizer(
    model.parameters(),
    tz.m.Centralize(dim=0),
    tz.m.LR(1e-2),
)

References: - Yong, H., Huang, J., Hua, X., & Zhang, L. (2020). Gradient centralization: A new optimization technique for deep neural networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 635-652). Springer International Publishing. https://arxiv.org/abs/2004.01461

Source code in torchzero/modules/clipping/clipping.py

class Centralize(TensorTransform):
    """Centralizes the update.

    Args:
        dim (int | Sequence[int] | str | None, optional):
            calculates norm along those dimensions.
            If list/tuple, tensors are centralized along all dimensios in `dim` that they have.
            Can be set to "global" to centralize by global mean of all gradients concatenated to a vector.
            Defaults to None.
        inverse_dims (bool, optional):
            if True, the `dims` argument is inverted, and all other dimensions are centralized.
        min_size (int, optional):
            minimal size of a dimension to normalize along it. Defaults to 1.

    Examples:

    Standard gradient centralization:
    ```python
    opt = tz.Optimizer(
        model.parameters(),
        tz.m.Centralize(dim=0),
        tz.m.LR(1e-2),
    )
    ```

    References:
    - Yong, H., Huang, J., Hua, X., & Zhang, L. (2020). Gradient centralization: A new optimization technique for deep neural networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 635-652). Springer International Publishing. https://arxiv.org/abs/2004.01461
    """
    def __init__(
        self,
        dim: int | Sequence[int] | Literal["global"] | None = None,
        inverse_dims: bool = False,
        min_size: int = 2,
    ):
        defaults = dict(dim=dim,min_size=min_size,inverse_dims=inverse_dims)
        super().__init__(defaults)

    @torch.no_grad
    def multi_tensor_apply(self, tensors, params, grads, loss, states, settings):
        dim, min_size, inverse_dims = itemgetter('dim', 'min_size', 'inverse_dims')(settings[0])

        _centralize_(tensors_ = TensorList(tensors), dim=dim, inverse_dims=inverse_dims, min_size=min_size)

        return tensors

ClipNorm ¶

Bases: torchzero.core.transform.TensorTransform

Clips update norm to be no larger than value.

Parameters:

max_norm (float) –

value to clip norm to.
ord (float, default: 2 ) –

norm order. Defaults to 2.
dim (int | Sequence[int] | str | None, default: None ) –

calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in dim that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None.
inverse_dims (bool, default: False ) –

if True, the dims argument is inverted, and all other dimensions are normalized.
min_size (int, default: 1 ) –

minimal numer of elements in a parameter or slice to clip norm. Defaults to 1.
target (str) –

what this affects.

Examples:

Gradient norm clipping:

opt = tz.Optimizer(
    model.parameters(),
    tz.m.ClipNorm(1),
    tz.m.Adam(),
    tz.m.LR(1e-2),
)

Update norm clipping:

opt = tz.Optimizer(
    model.parameters(),
    tz.m.Adam(),
    tz.m.ClipNorm(1),
    tz.m.LR(1e-2),
)

Source code in torchzero/modules/clipping/clipping.py

class ClipNorm(TensorTransform):
    """Clips update norm to be no larger than ``value``.

    Args:
        max_norm (float): value to clip norm to.
        ord (float, optional): norm order. Defaults to 2.
        dim (int | Sequence[int] | str | None, optional):
            calculates norm along those dimensions.
            If list/tuple, tensors are normalized along all dimensios in `dim` that they have.
            Can be set to "global" to normalize by global norm of all gradients concatenated to a vector.
            Defaults to None.
        inverse_dims (bool, optional):
            if True, the `dims` argument is inverted, and all other dimensions are normalized.
        min_size (int, optional):
            minimal numer of elements in a parameter or slice to clip norm. Defaults to 1.
        target (str, optional):
            what this affects.

    Examples:

    Gradient norm clipping:
    ```python
    opt = tz.Optimizer(
        model.parameters(),
        tz.m.ClipNorm(1),
        tz.m.Adam(),
        tz.m.LR(1e-2),
    )
    ```

    Update norm clipping:
    ```python
    opt = tz.Optimizer(
        model.parameters(),
        tz.m.Adam(),
        tz.m.ClipNorm(1),
        tz.m.LR(1e-2),
    )
    ```
    """
    def __init__(
        self,
        max_norm: float,
        ord: Metrics = 2,
        dim: int | Sequence[int] | Literal["global"] | None = None,
        inverse_dims: bool = False,
        min_size: int = 1,
    ):
        defaults = dict(max_norm=max_norm,ord=ord,dim=dim,min_size=min_size,inverse_dims=inverse_dims)
        super().__init__(defaults)

    @torch.no_grad
    def multi_tensor_apply(self, tensors, params, grads, loss, states, settings):
        max_norm = NumberList(s['max_norm'] for s in settings)
        ord, dim, min_size, inverse_dims = itemgetter('ord', 'dim', 'min_size', 'inverse_dims')(settings[0])
        _clip_norm_(
            tensors_ = TensorList(tensors),
            min = 0,
            max = max_norm,
            norm_value = None,
            ord = ord,
            dim = dim,
            inverse_dims=inverse_dims,
            min_size = min_size,
        )
        return tensors

ClipNormByEMA ¶

Bases: torchzero.core.transform.TensorTransform

Clips norm to be no larger than the norm of an exponential moving average of past updates.

Parameters:

beta (float, default: 0.99 ) –

beta for the exponential moving average. Defaults to 0.99.
ord (float, default: 2 ) –

order of the norm. Defaults to 2.
eps (float) –

epsilon for division. Defaults to 1e-6.
tensorwise (bool, default: True ) –

if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
max_ema_growth (float | None, default: 1.5 ) –

if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.
ema_init (str) –

How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.

Source code in torchzero/modules/clipping/ema_clipping.py

class ClipNormByEMA(TensorTransform):
    """Clips norm to be no larger than the norm of an exponential moving average of past updates.

    Args:
        beta (float, optional): beta for the exponential moving average. Defaults to 0.99.
        ord (float, optional): order of the norm. Defaults to 2.
        eps (float, optional): epsilon for division. Defaults to 1e-6.
        tensorwise (bool, optional):
            if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
        max_ema_growth (float | None, optional):
            if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.
        ema_init (str, optional):
            How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
    """
    NORMALIZE = False
    def __init__(
        self,
        beta=0.99,
        ord: Metrics = 2,
        tensorwise:bool=True,
        max_ema_growth: float | None = 1.5,
        init: float = 0.0,
        min_norm: float = 1e-6,

        inner: Chainable | None = None,
    ):
        defaults = dict(beta=beta, ord=ord, tensorwise=tensorwise, init=init, min_norm=min_norm, max_ema_growth=max_ema_growth)
        super().__init__(defaults, inner=inner)
        self.add_projected_keys("grad", "exp_avg")

    @torch.no_grad
    def multi_tensor_update(self, tensors, params, grads, loss, states, settings):
        tensors = TensorList(tensors)
        eps = torch.finfo(tensors[0].dtype).tiny * 2
        ord, tensorwise, init, max_ema_growth = itemgetter('ord', 'tensorwise', 'init', 'max_ema_growth')(settings[0])

        beta, min_norm = unpack_dicts(settings, 'beta', 'min_norm', cls=NumberList)

        exp_avg = unpack_states(states, tensors, 'exp_avg', init = lambda x: torch.full_like(x, init), cls=TensorList)

        exp_avg.lerp_(tensors, 1-beta)

        # ----------------------------- tensorwise update ---------------------------- #
        if tensorwise:
            tensors_norm = tensors.norm(ord)
            ema_norm = exp_avg.metric(ord)

            # clip ema norm growth
            if max_ema_growth is not None:
                prev_ema_norm = unpack_states(states, tensors, 'prev_ema_norm', init=ema_norm, cls=TensorList)
                allowed_norm = (prev_ema_norm * max_ema_growth).clip(min=min_norm)

                ema_denom = (ema_norm / allowed_norm).clip(min=1)
                exp_avg.div_(ema_denom)
                ema_norm.div_(ema_denom)

                prev_ema_norm.set_(ema_norm)


        # ------------------------------- global update ------------------------------ #
        else:
            tensors_norm = tensors.global_metric(ord)
            ema_norm = exp_avg.global_metric(ord)

            # clip ema norm growth
            if max_ema_growth is not None:
                prev_ema_norm = self.global_state.setdefault('prev_ema_norm', ema_norm)
                allowed_norm = (prev_ema_norm * max_ema_growth).clip(min=min_norm[0])

                if ema_norm > allowed_norm:
                    exp_avg.div_(ema_norm / allowed_norm)
                    ema_norm = allowed_norm

                prev_ema_norm.set_(ema_norm)


        # ------------------- compute denominator to clip/normalize ------------------ #
        denom = tensors_norm / ema_norm.clip(min=eps)
        if self.NORMALIZE: denom.clip_(min=eps)
        else: denom.clip_(min=1)
        self.global_state['denom'] = denom

    @torch.no_grad
    def multi_tensor_apply(self, tensors, params, grads, loss, states, settings):
        denom = self.global_state.pop('denom')
        torch._foreach_div_(tensors, denom)
        return tensors

NORMALIZE `class-attribute` ¶

NORMALIZE = False

bool(x) -> bool

Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.

ClipNormGrowth ¶

Bases: torchzero.core.transform.TensorTransform

Clips update norm growth.

Parameters:

add (float | None, default: None ) –

additive clipping, next update norm is at most previous norm + add. Defaults to None.
mul (float | None, default: 1.5 ) –

multiplicative clipping, next update norm is at most previous norm * mul. Defaults to 1.5.
min_value (float | None, default: 0.0001 ) –

minimum value for multiplicative clipping to prevent collapse to 0. Next norm is at most :code:max(prev_norm, min_value) * mul. Defaults to 1e-4.
max_decay (float | None, default: 2 ) –

bounds the tracked multiplicative clipping decay to prevent collapse to 0. Next norm is at most :code:max(previous norm * mul, max_decay). Defaults to 2.
ord (float, default: 2 ) –

norm order. Defaults to 2.
tensorwise (bool, default: True ) –

if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
target (Target) –

what to set on var. Defaults to "update".

Source code in torchzero/modules/clipping/growth_clipping.py

class ClipNormGrowth(TensorTransform):
    """Clips update norm growth.

    Args:
        add (float | None, optional): additive clipping, next update norm is at most `previous norm + add`. Defaults to None.
        mul (float | None, optional):
            multiplicative clipping, next update norm is at most `previous norm * mul`. Defaults to 1.5.
        min_value (float | None, optional):
            minimum value for multiplicative clipping to prevent collapse to 0.
            Next norm is at most :code:`max(prev_norm, min_value) * mul`. Defaults to 1e-4.
        max_decay (float | None, optional):
            bounds the tracked multiplicative clipping decay to prevent collapse to 0.
            Next norm is at most :code:`max(previous norm * mul, max_decay)`.
            Defaults to 2.
        ord (float, optional): norm order. Defaults to 2.
        tensorwise (bool, optional):
            if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
        target (Target, optional): what to set on var. Defaults to "update".
    """
    def __init__(
        self,
        add: float | None = None,
        mul: float | None = 1.5,
        min_value: float | None = 1e-4,
        max_decay: float | None = 2,
        ord: float = 2,
        tensorwise=True,
    ):
        defaults = dict(add=add, mul=mul, min_value=min_value, max_decay=max_decay, ord=ord, tensorwise=tensorwise)
        super().__init__(defaults)


    def multi_tensor_apply(self, tensors, params, grads, loss, states, settings):
        tensorwise = settings[0]['tensorwise']
        tensors = TensorList(tensors)

        if tensorwise:
            ts = tensors
            stts = states
            stns = settings

        else:
            ts = [tensors.to_vec()]
            stts = [self.global_state]
            stns = [settings[0]]


        for t, state, setting in zip(ts, stts, stns):
            if 'prev_norm' not in state:
                state['prev_norm'] = torch.linalg.vector_norm(t, ord=setting['ord']) # pylint:disable=not-callable
                state['prev_denom'] = 1
                continue

            _,  state['prev_norm'], state['prev_denom'] = norm_growth_clip_(
                tensor_ = t,
                prev_norm = state['prev_norm'],
                add = setting['add'],
                mul = setting['mul'],
                min_value = setting['min_value'],
                max_decay = setting['max_decay'],
                ord = setting['ord'],
            )

        if not tensorwise:
            tensors.from_vec_(ts[0])

        return tensors

ClipValue ¶

Bases: torchzero.core.transform.TensorTransform

Clips update magnitude to be within (-value, value) range.

Parameters:

value (float) –

value to clip to.
target (str) –

refer to target argument in documentation.

Examples:

Gradient clipping:

opt = tz.Optimizer(
    model.parameters(),
    tz.m.ClipValue(1),
    tz.m.Adam(),
    tz.m.LR(1e-2),
)

Update clipping:

opt = tz.Optimizer(
    model.parameters(),
    tz.m.Adam(),
    tz.m.ClipValue(1),
    tz.m.LR(1e-2),
)

Source code in torchzero/modules/clipping/clipping.py

class ClipValue(TensorTransform):
    """Clips update magnitude to be within ``(-value, value)`` range.

    Args:
        value (float): value to clip to.
        target (str): refer to ``target argument`` in documentation.

    Examples:

    Gradient clipping:
    ```python
    opt = tz.Optimizer(
        model.parameters(),
        tz.m.ClipValue(1),
        tz.m.Adam(),
        tz.m.LR(1e-2),
    )
    ```

    Update clipping:
    ```python
    opt = tz.Optimizer(
        model.parameters(),
        tz.m.Adam(),
        tz.m.ClipValue(1),
        tz.m.LR(1e-2),
    )
    ```

    """
    def __init__(self, value: float):
        defaults = dict(value=value)
        super().__init__(defaults)

    @torch.no_grad
    def multi_tensor_apply(self, tensors, params, grads, loss, states, settings):
        value = [s['value'] for s in settings]
        return TensorList(tensors).clip_([-v for v in value], value)

ClipValueByEMA ¶

Bases: torchzero.core.transform.TensorTransform

Clips magnitude of update to be no larger than magnitude of exponential moving average of past (unclipped) updates.

Parameters:

beta (float, default: 0.99 ) –

beta for the exponential moving average. Defaults to 0.99.
ema_init (str) –

How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
exp_avg_tfm (Chainable | None, default: None ) –

optional modules applied to exponential moving average before clipping by it. Defaults to None.

Source code in torchzero/modules/clipping/ema_clipping.py

class ClipValueByEMA(TensorTransform):
    """Clips magnitude of update to be no larger than magnitude of exponential moving average of past (unclipped) updates.

    Args:
        beta (float, optional): beta for the exponential moving average. Defaults to 0.99.
        ema_init (str, optional):
            How to initialize exponential moving average on first step,
            "update" to use the first update or "zeros". Defaults to 'zeros'.
        exp_avg_tfm (Chainable | None, optional):
            optional modules applied to exponential moving average before clipping by it. Defaults to None.
    """
    def __init__(
        self,
        beta=0.99,
        init: float = 0,

        inner: Chainable | None = None,
        exp_avg_tfm:Chainable | None=None,
    ):
        defaults = dict(beta=beta, init=init)
        super().__init__(defaults, inner=inner)

        self.set_child('exp_avg', exp_avg_tfm)
        self.add_projected_keys("grad", "exp_avg")

    def single_tensor_initialize(self, tensor, param, grad, loss, state, setting):
        state["exp_avg"] = tensor.abs() * setting["init"]

    @torch.no_grad
    def multi_tensor_update(self, tensors, params, grads, loss, states, settings):
        tensors = TensorList(tensors)
        beta = unpack_dicts(settings, 'beta', cls=NumberList)

        exp_avg = unpack_states(states, tensors, 'exp_avg', must_exist=True, cls=TensorList)
        exp_avg.lerp_(tensors.abs(), 1-beta)

    def multi_tensor_apply(self, tensors, params, grads, loss, states, settings):
        tensors = TensorList(tensors)
        exp_avg = unpack_states(states, tensors, 'exp_avg')

        exp_avg = TensorList(
            self.inner_step_tensors("exp_avg", exp_avg, clone=True, params=params, grads=grads, loss=loss, must_exist=False))

        tensors.clip_(-exp_avg, exp_avg)
        return tensors

ClipValueGrowth ¶

Bases: torchzero.core.transform.TensorTransform

Clips update value magnitude growth.

Parameters:

add (float | None, default: None ) –

additive clipping, next update is at most previous update + add. Defaults to None.
mul (float | None, default: 1.5 ) –

multiplicative clipping, next update is at most previous update * mul. Defaults to 1.5.
min_value (float | None, default: 0.0001 ) –

minimum value for multiplicative clipping to prevent collapse to 0. Next update is at most :code:max(prev_update, min_value) * mul. Defaults to 1e-4.
max_decay (float | None, default: 2 ) –

bounds the tracked multiplicative clipping decay to prevent collapse to 0. Next update is at most :code:max(previous update * mul, max_decay). Defaults to 2.
target (Target) –

what to set on var. Defaults to "update".

Source code in torchzero/modules/clipping/growth_clipping.py

class ClipValueGrowth(TensorTransform):
    """Clips update value magnitude growth.

    Args:
        add (float | None, optional): additive clipping, next update is at most `previous update + add`. Defaults to None.
        mul (float | None, optional): multiplicative clipping, next update is at most `previous update * mul`. Defaults to 1.5.
        min_value (float | None, optional):
            minimum value for multiplicative clipping to prevent collapse to 0.
            Next update is at most :code:`max(prev_update, min_value) * mul`. Defaults to 1e-4.
        max_decay (float | None, optional):
            bounds the tracked multiplicative clipping decay to prevent collapse to 0.
            Next update is at most :code:`max(previous update * mul, max_decay)`.
            Defaults to 2.
        target (Target, optional): what to set on var. Defaults to "update".
    """
    def __init__(
        self,
        add: float | None = None,
        mul: float | None = 1.5,
        min_value: float | None = 1e-4,
        max_decay: float | None = 2,
    ):
        defaults = dict(add=add, mul=mul, min_value=min_value, max_decay=max_decay)
        super().__init__(defaults)
        self.add_projected_keys("grad", "prev")


    def single_tensor_apply(self, tensor, param, grad, loss, state, setting):
        add, mul, min_value, max_decay = itemgetter('add','mul','min_value','max_decay')(setting)
        add: float | None

        if add is None and mul is None:
            return tensor

        if 'prev' not in state:
            state['prev'] = tensor.clone()
            return tensor

        prev: torch.Tensor = state['prev']

        # additive bound
        if add is not None:
            growth = (tensor.abs() - prev.abs()).clip(min=0)
            tensor.sub_(torch.where(growth > add, (growth-add).copysign_(tensor), 0))

        # multiplicative bound
        growth = None
        if mul is not None:
            prev_magn = prev.abs()
            if min_value is not None: prev_magn.clip_(min=min_value)
            growth = (tensor.abs() / prev_magn).clamp_(min=1e-8)

            denom = torch.where(growth > mul, growth/mul, 1)

            tensor.div_(denom)

        # limit max growth decay
        if max_decay is not None:
            if growth is None:
                prev_magn = prev.abs()
                if min_value is not None: prev_magn.clip_(min=min_value)
                growth = (tensor.abs() / prev_magn).clamp_(min=1e-8)

            new_prev = torch.where(growth < (1/max_decay), prev/max_decay, tensor)
        else:
            new_prev = tensor.clone()

        state['prev'] = new_prev
        return tensor

Normalize ¶

Bases: torchzero.core.transform.TensorTransform

Normalizes the update.

Parameters:

norm_value (float, default: 1 ) –

desired norm value.
ord (float, default: 2 ) –

norm order. Defaults to 2.
dim (int | Sequence[int] | str | None, default: None ) –

calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in dim that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None.
inverse_dims (bool, default: False ) –

if True, the dims argument is inverted, and all other dimensions are normalized.
min_size (int, default: 1 ) –

minimal size of a dimension to normalize along it. Defaults to 1.
target (str) –

what this affects.

Examples: Gradient normalization:

opt = tz.Optimizer(
    model.parameters(),
    tz.m.Normalize(1),
    tz.m.Adam(),
    tz.m.LR(1e-2),
)

Update normalization:

opt = tz.Optimizer(
    model.parameters(),
    tz.m.Adam(),
    tz.m.Normalize(1),
    tz.m.LR(1e-2),
)

Source code in torchzero/modules/clipping/clipping.py

class Normalize(TensorTransform):
    """Normalizes the update.

    Args:
        norm_value (float): desired norm value.
        ord (float, optional): norm order. Defaults to 2.
        dim (int | Sequence[int] | str | None, optional):
            calculates norm along those dimensions.
            If list/tuple, tensors are normalized along all dimensios in `dim` that they have.
            Can be set to "global" to normalize by global norm of all gradients concatenated to a vector.
            Defaults to None.
        inverse_dims (bool, optional):
            if True, the `dims` argument is inverted, and all other dimensions are normalized.
        min_size (int, optional):
            minimal size of a dimension to normalize along it. Defaults to 1.
        target (str, optional):
            what this affects.

    Examples:
    Gradient normalization:
    ```python
    opt = tz.Optimizer(
        model.parameters(),
        tz.m.Normalize(1),
        tz.m.Adam(),
        tz.m.LR(1e-2),
    )
    ```

    Update normalization:

    ```python
    opt = tz.Optimizer(
        model.parameters(),
        tz.m.Adam(),
        tz.m.Normalize(1),
        tz.m.LR(1e-2),
    )
    ```
    """
    def __init__(
        self,
        norm_value: float = 1,
        ord: Metrics = 2,
        dim: int | Sequence[int] | Literal["global"] | None = None,
        inverse_dims: bool = False,
        min_size: int = 1,
    ):
        defaults = dict(norm_value=norm_value,ord=ord,dim=dim,min_size=min_size, inverse_dims=inverse_dims)
        super().__init__(defaults)

    @torch.no_grad
    def multi_tensor_apply(self, tensors, params, grads, loss, states, settings):
        norm_value = NumberList(s['norm_value'] for s in settings)
        ord, dim, min_size, inverse_dims = itemgetter('ord', 'dim', 'min_size', 'inverse_dims')(settings[0])

        _clip_norm_(
            tensors_ = TensorList(tensors),
            min = None,
            max = None,
            norm_value = norm_value,
            ord = ord,
            dim = dim,
            inverse_dims=inverse_dims,
            min_size = min_size,
        )

        return tensors

NormalizeByEMA ¶

Bases: torchzero.modules.clipping.ema_clipping.ClipNormByEMA

Sets norm of the update to be the same as the norm of an exponential moving average of past updates.

Parameters:

beta (float, default: 0.99 ) –

beta for the exponential moving average. Defaults to 0.99.
ord (float, default: 2 ) –

order of the norm. Defaults to 2.
eps (float) –

epsilon for division. Defaults to 1e-6.
tensorwise (bool, default: True ) –

if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
max_ema_growth (float | None, default: 1.5 ) –

if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.
ema_init (str) –

How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.

Source code in torchzero/modules/clipping/ema_clipping.py

class NormalizeByEMA(ClipNormByEMA):
    """Sets norm of the update to be the same as the norm of an exponential moving average of past updates.

    Args:
        beta (float, optional): beta for the exponential moving average. Defaults to 0.99.
        ord (float, optional): order of the norm. Defaults to 2.
        eps (float, optional): epsilon for division. Defaults to 1e-6.
        tensorwise (bool, optional):
            if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
        max_ema_growth (float | None, optional):
            if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.
        ema_init (str, optional):
            How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
    """
    NORMALIZE = True

NORMALIZE `class-attribute` ¶

NORMALIZE = True

bool(x) -> bool

Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.

clip_grad_norm_ ¶

clip_grad_norm_(params: Iterable[Tensor], max_norm: float | None, ord: Union[Literal['mad', 'std', 'var', 'sum', 'l0', 'l1', 'l2', 'l3', 'l4', 'linf'], float, Tensor] = 2, dim: Union[int, Sequence[int], Literal['global'], NoneType] = None, inverse_dims: bool = False, min_size: int = 2, min_norm: float | None = None)

Clips gradient of an iterable of parameters to specified norm value. Gradients are modified in-place.

Parameters:

params (Iterable[Tensor]) –

parameters with gradients to clip.
max_norm (float) –

value to clip norm to.
ord (float, default: 2 ) –

norm order. Defaults to 2.
dim (int | Sequence[int] | str | None, default: None ) –

calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in dim that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None.
min_size (int, default: 2 ) –

minimal size of a dimension to normalize along it. Defaults to 1.

Source code in torchzero/modules/clipping/clipping.py

def clip_grad_norm_(
    params: Iterable[torch.Tensor],
    max_norm: float | None,
    ord: Metrics = 2,
    dim: int | Sequence[int] | Literal["global"] | None = None,
    inverse_dims: bool = False,
    min_size: int = 2,
    min_norm: float | None = None,
):
    """Clips gradient of an iterable of parameters to specified norm value.
    Gradients are modified in-place.

    Args:
        params (Iterable[torch.Tensor]): parameters with gradients to clip.
        max_norm (float): value to clip norm to.
        ord (float, optional): norm order. Defaults to 2.
        dim (int | Sequence[int] | str | None, optional):
            calculates norm along those dimensions.
            If list/tuple, tensors are normalized along all dimensios in `dim` that they have.
            Can be set to "global" to normalize by global norm of all gradients concatenated to a vector.
            Defaults to None.
        min_size (int, optional):
            minimal size of a dimension to normalize along it. Defaults to 1.
    """
    grads = TensorList(p.grad for p in params if p.grad is not None)
    _clip_norm_(grads, min=min_norm, max=max_norm, norm_value=None, ord=ord, dim=dim, inverse_dims=inverse_dims, min_size=min_size)

clip_grad_value_ ¶

clip_grad_value_(params: Iterable[Tensor], value: float)

Clips gradient of an iterable of parameters at specified value. Gradients are modified in-place. Args: params (Iterable[Tensor]): iterable of tensors with gradients to clip. value (float or int): maximum allowed value of gradient

Source code in torchzero/modules/clipping/clipping.py

def clip_grad_value_(params: Iterable[torch.Tensor], value: float):
    """Clips gradient of an iterable of parameters at specified value.
    Gradients are modified in-place.
    Args:
        params (Iterable[Tensor]): iterable of tensors with gradients to clip.
        value (float or int): maximum allowed value of gradient
    """
    grads = [p.grad for p in params if p.grad is not None]
    torch._foreach_clamp_min_(grads, -value)
    torch._foreach_clamp_max_(grads, value)

normalize_grads_ ¶

normalize_grads_(params: Iterable[Tensor], norm_value: float, ord: Union[Literal['mad', 'std', 'var', 'sum', 'l0', 'l1', 'l2', 'l3', 'l4', 'linf'], float, Tensor] = 2, dim: Union[int, Sequence[int], Literal['global'], NoneType] = None, inverse_dims: bool = False, min_size: int = 1)

Normalizes gradient of an iterable of parameters to specified norm value. Gradients are modified in-place.

Parameters:

params (Iterable[Tensor]) –

parameters with gradients to clip.
norm_value (float) –

value to clip norm to.
ord (float, default: 2 ) –

norm order. Defaults to 2.
dim (int | Sequence[int] | str | None, default: None ) –

calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in dim that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None.
inverse_dims (bool, default: False ) –

if True, the dims argument is inverted, and all other dimensions are normalized.
min_size (int, default: 1 ) –

minimal size of a dimension to normalize along it. Defaults to 1.

Source code in torchzero/modules/clipping/clipping.py

def normalize_grads_(
    params: Iterable[torch.Tensor],
    norm_value: float,
    ord: Metrics = 2,
    dim: int | Sequence[int] | Literal["global"] | None = None,
    inverse_dims: bool = False,
    min_size: int = 1,
):
    """Normalizes gradient of an iterable of parameters to specified norm value.
    Gradients are modified in-place.

    Args:
        params (Iterable[torch.Tensor]): parameters with gradients to clip.
        norm_value (float): value to clip norm to.
        ord (float, optional): norm order. Defaults to 2.
        dim (int | Sequence[int] | str | None, optional):
            calculates norm along those dimensions.
            If list/tuple, tensors are normalized along all dimensios in `dim` that they have.
            Can be set to "global" to normalize by global norm of all gradients concatenated to a vector.
            Defaults to None.
        inverse_dims (bool, optional):
            if True, the `dims` argument is inverted, and all other dimensions are normalized.
        min_size (int, optional):
            minimal size of a dimension to normalize along it. Defaults to 1.
    """
    grads = TensorList(p.grad for p in params if p.grad is not None)
    _clip_norm_(grads, min=None, max=None, norm_value=norm_value, ord=ord, dim=dim, inverse_dims=inverse_dims, min_size=min_size)

Clippping¶

Centralize ¶

ClipNorm ¶

ClipNormByEMA ¶

NORMALIZE class-attribute ¶

ClipNormGrowth ¶

ClipValue ¶

ClipValueByEMA ¶

ClipValueGrowth ¶

Normalize ¶

NormalizeByEMA ¶

NORMALIZE class-attribute ¶

clip_grad_norm_ ¶

clip_grad_value_ ¶

normalize_grads_ ¶

NORMALIZE `class-attribute` ¶

NORMALIZE `class-attribute` ¶