Clippping¶
This subpackage contains modules like gradient clipping, normalization, centralization, etc.
Classes:
-
Centralize
–Centralizes the update.
-
ClipNorm
–Clips update norm to be no larger than
value
. -
ClipNormByEMA
–Clips norm to be no larger than the norm of an exponential moving average of past updates.
-
ClipNormGrowth
–Clips update norm growth.
-
ClipValue
–Clips update magnitude to be within
(-value, value)
range. -
ClipValueByEMA
–Clips magnitude of update to be no larger than magnitude of exponential moving average of past (unclipped) updates.
-
ClipValueGrowth
–Clips update value magnitude growth.
-
Normalize
–Normalizes the update.
-
NormalizeByEMA
–Sets norm of the update to be the same as the norm of an exponential moving average of past updates.
Functions:
-
clip_grad_norm_
–Clips gradient of an iterable of parameters to specified norm value.
-
clip_grad_value_
–Clips gradient of an iterable of parameters at specified value.
-
normalize_grads_
–Normalizes gradient of an iterable of parameters to specified norm value.
Centralize ¶
Bases: torchzero.core.transform.Transform
Centralizes the update.
Parameters:
-
dim
(int | Sequence[int] | str | None
, default:None
) –calculates norm along those dimensions. If list/tuple, tensors are centralized along all dimensios in
dim
that they have. Can be set to "global" to centralize by global mean of all gradients concatenated to a vector. Defaults to None. -
inverse_dims
(bool
, default:False
) –if True, the
dims
argument is inverted, and all other dimensions are centralized. -
min_size
(int
, default:2
) –minimal size of a dimension to normalize along it. Defaults to 1.
Examples:
Standard gradient centralization:
References: - Yong, H., Huang, J., Hua, X., & Zhang, L. (2020). Gradient centralization: A new optimization technique for deep neural networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 635-652). Springer International Publishing. https://arxiv.org/abs/2004.01461
Source code in torchzero/modules/clipping/clipping.py
ClipNorm ¶
Bases: torchzero.core.transform.Transform
Clips update norm to be no larger than value
.
Parameters:
-
max_norm
(float
) –value to clip norm to.
-
ord
(float
, default:2
) –norm order. Defaults to 2.
-
dim
(int | Sequence[int] | str | None
, default:None
) –calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in
dim
that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None. -
inverse_dims
(bool
, default:False
) –if True, the
dims
argument is inverted, and all other dimensions are normalized. -
min_size
(int
, default:1
) –minimal numer of elements in a parameter or slice to clip norm. Defaults to 1.
-
target
(str
, default:'update'
) –what this affects.
Examples:
Gradient norm clipping:
Update norm clipping:
Source code in torchzero/modules/clipping/clipping.py
ClipNormByEMA ¶
Bases: torchzero.core.transform.Transform
Clips norm to be no larger than the norm of an exponential moving average of past updates.
Parameters:
-
beta
(float
, default:0.99
) –beta for the exponential moving average. Defaults to 0.99.
-
ord
(float
, default:2
) –order of the norm. Defaults to 2.
-
eps
(float
, default:1e-06
) –epsilon for division. Defaults to 1e-6.
-
tensorwise
(bool
, default:True
) –if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
-
max_ema_growth
(float | None
, default:1.5
) –if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.
-
ema_init
(str
, default:'zeros'
) –How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
Source code in torchzero/modules/clipping/ema_clipping.py
NORMALIZE
class-attribute
¶
bool(x) -> bool
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
ClipNormGrowth ¶
Bases: torchzero.core.transform.Transform
Clips update norm growth.
Parameters:
-
add
(float | None
, default:None
) –additive clipping, next update norm is at most
previous norm + add
. Defaults to None. -
mul
(float | None
, default:1.5
) –multiplicative clipping, next update norm is at most
previous norm * mul
. Defaults to 1.5. -
min_value
(float | None
, default:0.0001
) –minimum value for multiplicative clipping to prevent collapse to 0. Next norm is at most :code:
max(prev_norm, min_value) * mul
. Defaults to 1e-4. -
max_decay
(float | None
, default:2
) –bounds the tracked multiplicative clipping decay to prevent collapse to 0. Next norm is at most :code:
max(previous norm * mul, max_decay)
. Defaults to 2. -
ord
(float
, default:2
) –norm order. Defaults to 2.
-
parameterwise
(bool
, default:True
) –if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
-
target
(Literal
, default:'update'
) –what to set on var. Defaults to "update".
Source code in torchzero/modules/clipping/growth_clipping.py
ClipValue ¶
Bases: torchzero.core.transform.Transform
Clips update magnitude to be within (-value, value)
range.
Parameters:
-
value
(float
) –value to clip to.
-
target
(str
, default:'update'
) –refer to
target argument
in documentation.
Examples:
Gradient clipping:
Update clipping:
Source code in torchzero/modules/clipping/clipping.py
ClipValueByEMA ¶
Bases: torchzero.core.transform.Transform
Clips magnitude of update to be no larger than magnitude of exponential moving average of past (unclipped) updates.
Parameters:
-
beta
(float
, default:0.99
) –beta for the exponential moving average. Defaults to 0.99.
-
ema_init
(str
, default:'zeros'
) –How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
-
ema_tfm
(Chainable | None
, default:None
) –optional modules applied to exponential moving average before clipping by it. Defaults to None.
Source code in torchzero/modules/clipping/ema_clipping.py
ClipValueGrowth ¶
Bases: torchzero.core.transform.TensorwiseTransform
Clips update value magnitude growth.
Parameters:
-
add
(float | None
, default:None
) –additive clipping, next update is at most
previous update + add
. Defaults to None. -
mul
(float | None
, default:1.5
) –multiplicative clipping, next update is at most
previous update * mul
. Defaults to 1.5. -
min_value
(float | None
, default:0.0001
) –minimum value for multiplicative clipping to prevent collapse to 0. Next update is at most :code:
max(prev_update, min_value) * mul
. Defaults to 1e-4. -
max_decay
(float | None
, default:2
) –bounds the tracked multiplicative clipping decay to prevent collapse to 0. Next update is at most :code:
max(previous update * mul, max_decay)
. Defaults to 2. -
target
(Literal
, default:'update'
) –what to set on var. Defaults to "update".
Source code in torchzero/modules/clipping/growth_clipping.py
Normalize ¶
Bases: torchzero.core.transform.Transform
Normalizes the update.
Parameters:
-
norm_value
(float
, default:1
) –desired norm value.
-
ord
(float
, default:2
) –norm order. Defaults to 2.
-
dim
(int | Sequence[int] | str | None
, default:None
) –calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in
dim
that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None. -
inverse_dims
(bool
, default:False
) –if True, the
dims
argument is inverted, and all other dimensions are normalized. -
min_size
(int
, default:1
) –minimal size of a dimension to normalize along it. Defaults to 1.
-
target
(str
, default:'update'
) –what this affects.
Examples: Gradient normalization:
Update normalization:
Source code in torchzero/modules/clipping/clipping.py
NormalizeByEMA ¶
Bases: torchzero.modules.clipping.ema_clipping.ClipNormByEMA
Sets norm of the update to be the same as the norm of an exponential moving average of past updates.
Parameters:
-
beta
(float
, default:0.99
) –beta for the exponential moving average. Defaults to 0.99.
-
ord
(float
, default:2
) –order of the norm. Defaults to 2.
-
eps
(float
, default:1e-06
) –epsilon for division. Defaults to 1e-6.
-
tensorwise
(bool
, default:True
) –if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
-
max_ema_growth
(float | None
, default:1.5
) –if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.
-
ema_init
(str
, default:'zeros'
) –How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
Source code in torchzero/modules/clipping/ema_clipping.py
NORMALIZE
class-attribute
¶
bool(x) -> bool
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
clip_grad_norm_ ¶
clip_grad_norm_(params: Iterable[Tensor], max_norm: float | None, ord: Union[Literal['mad', 'std', 'var', 'sum', 'l0', 'l1', 'l2', 'l3', 'l4', 'linf'], float, Tensor] = 2, dim: Union[int, Sequence[int], Literal['global'], NoneType] = None, inverse_dims: bool = False, min_size: int = 2, min_norm: float | None = None)
Clips gradient of an iterable of parameters to specified norm value. Gradients are modified in-place.
Parameters:
-
params
(Iterable[Tensor]
) –parameters with gradients to clip.
-
max_norm
(float
) –value to clip norm to.
-
ord
(float
, default:2
) –norm order. Defaults to 2.
-
dim
(int | Sequence[int] | str | None
, default:None
) –calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in
dim
that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None. -
min_size
(int
, default:2
) –minimal size of a dimension to normalize along it. Defaults to 1.
Source code in torchzero/modules/clipping/clipping.py
clip_grad_value_ ¶
Clips gradient of an iterable of parameters at specified value. Gradients are modified in-place. Args: params (Iterable[Tensor]): iterable of tensors with gradients to clip. value (float or int): maximum allowed value of gradient
Source code in torchzero/modules/clipping/clipping.py
normalize_grads_ ¶
normalize_grads_(params: Iterable[Tensor], norm_value: float, ord: Union[Literal['mad', 'std', 'var', 'sum', 'l0', 'l1', 'l2', 'l3', 'l4', 'linf'], float, Tensor] = 2, dim: Union[int, Sequence[int], Literal['global'], NoneType] = None, inverse_dims: bool = False, min_size: int = 1)
Normalizes gradient of an iterable of parameters to specified norm value. Gradients are modified in-place.
Parameters:
-
params
(Iterable[Tensor]
) –parameters with gradients to clip.
-
norm_value
(float
) –value to clip norm to.
-
ord
(float
, default:2
) –norm order. Defaults to 2.
-
dim
(int | Sequence[int] | str | None
, default:None
) –calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in
dim
that they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None. -
inverse_dims
(bool
, default:False
) –if True, the
dims
argument is inverted, and all other dimensions are normalized. -
min_size
(int
, default:1
) –minimal size of a dimension to normalize along it. Defaults to 1.