Clippping¶
This subpackage contains modules like gradient clipping, normalization, centralization, etc.
Classes:
-
Centralize–Centralizes the update.
-
ClipNorm–Clips update norm to be no larger than
value. -
ClipNormByEMA–Clips norm to be no larger than the norm of an exponential moving average of past updates.
-
ClipNormGrowth–Clips update norm growth.
-
ClipValue–Clips update magnitude to be within
(-value, value)range. -
ClipValueByEMA–Clips magnitude of update to be no larger than magnitude of exponential moving average of past (unclipped) updates.
-
ClipValueGrowth–Clips update value magnitude growth.
-
Normalize–Normalizes the update.
-
NormalizeByEMA–Sets norm of the update to be the same as the norm of an exponential moving average of past updates.
Functions:
-
clip_grad_norm_–Clips gradient of an iterable of parameters to specified norm value.
-
clip_grad_value_–Clips gradient of an iterable of parameters at specified value.
-
normalize_grads_–Normalizes gradient of an iterable of parameters to specified norm value.
Centralize ¶
Bases: torchzero.core.transform.TensorTransform
Centralizes the update.
Parameters:
-
dim(int | Sequence[int] | str | None, default:None) –calculates norm along those dimensions. If list/tuple, tensors are centralized along all dimensios in
dimthat they have. Can be set to "global" to centralize by global mean of all gradients concatenated to a vector. Defaults to None. -
inverse_dims(bool, default:False) –if True, the
dimsargument is inverted, and all other dimensions are centralized. -
min_size(int, default:2) –minimal size of a dimension to normalize along it. Defaults to 1.
Examples:
Standard gradient centralization:
References: - Yong, H., Huang, J., Hua, X., & Zhang, L. (2020). Gradient centralization: A new optimization technique for deep neural networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 635-652). Springer International Publishing. https://arxiv.org/abs/2004.01461
Source code in torchzero/modules/clipping/clipping.py
ClipNorm ¶
Bases: torchzero.core.transform.TensorTransform
Clips update norm to be no larger than value.
Parameters:
-
max_norm(float) –value to clip norm to.
-
ord(float, default:2) –norm order. Defaults to 2.
-
dim(int | Sequence[int] | str | None, default:None) –calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in
dimthat they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None. -
inverse_dims(bool, default:False) –if True, the
dimsargument is inverted, and all other dimensions are normalized. -
min_size(int, default:1) –minimal numer of elements in a parameter or slice to clip norm. Defaults to 1.
-
target(str) –what this affects.
Examples:
Gradient norm clipping:
Update norm clipping:
Source code in torchzero/modules/clipping/clipping.py
ClipNormByEMA ¶
Bases: torchzero.core.transform.TensorTransform
Clips norm to be no larger than the norm of an exponential moving average of past updates.
Parameters:
-
beta(float, default:0.99) –beta for the exponential moving average. Defaults to 0.99.
-
ord(float, default:2) –order of the norm. Defaults to 2.
-
eps(float) –epsilon for division. Defaults to 1e-6.
-
tensorwise(bool, default:True) –if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
-
max_ema_growth(float | None, default:1.5) –if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.
-
ema_init(str) –How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
Source code in torchzero/modules/clipping/ema_clipping.py
NORMALIZE
class-attribute
¶
bool(x) -> bool
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
ClipNormGrowth ¶
Bases: torchzero.core.transform.TensorTransform
Clips update norm growth.
Parameters:
-
add(float | None, default:None) –additive clipping, next update norm is at most
previous norm + add. Defaults to None. -
mul(float | None, default:1.5) –multiplicative clipping, next update norm is at most
previous norm * mul. Defaults to 1.5. -
min_value(float | None, default:0.0001) –minimum value for multiplicative clipping to prevent collapse to 0. Next norm is at most :code:
max(prev_norm, min_value) * mul. Defaults to 1e-4. -
max_decay(float | None, default:2) –bounds the tracked multiplicative clipping decay to prevent collapse to 0. Next norm is at most :code:
max(previous norm * mul, max_decay). Defaults to 2. -
ord(float, default:2) –norm order. Defaults to 2.
-
tensorwise(bool, default:True) –if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
-
target(Target) –what to set on var. Defaults to "update".
Source code in torchzero/modules/clipping/growth_clipping.py
ClipValue ¶
Bases: torchzero.core.transform.TensorTransform
Clips update magnitude to be within (-value, value) range.
Parameters:
-
value(float) –value to clip to.
-
target(str) –refer to
target argumentin documentation.
Examples:
Gradient clipping:
Update clipping:
Source code in torchzero/modules/clipping/clipping.py
ClipValueByEMA ¶
Bases: torchzero.core.transform.TensorTransform
Clips magnitude of update to be no larger than magnitude of exponential moving average of past (unclipped) updates.
Parameters:
-
beta(float, default:0.99) –beta for the exponential moving average. Defaults to 0.99.
-
ema_init(str) –How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
-
exp_avg_tfm(Chainable | None, default:None) –optional modules applied to exponential moving average before clipping by it. Defaults to None.
Source code in torchzero/modules/clipping/ema_clipping.py
ClipValueGrowth ¶
Bases: torchzero.core.transform.TensorTransform
Clips update value magnitude growth.
Parameters:
-
add(float | None, default:None) –additive clipping, next update is at most
previous update + add. Defaults to None. -
mul(float | None, default:1.5) –multiplicative clipping, next update is at most
previous update * mul. Defaults to 1.5. -
min_value(float | None, default:0.0001) –minimum value for multiplicative clipping to prevent collapse to 0. Next update is at most :code:
max(prev_update, min_value) * mul. Defaults to 1e-4. -
max_decay(float | None, default:2) –bounds the tracked multiplicative clipping decay to prevent collapse to 0. Next update is at most :code:
max(previous update * mul, max_decay). Defaults to 2. -
target(Target) –what to set on var. Defaults to "update".
Source code in torchzero/modules/clipping/growth_clipping.py
Normalize ¶
Bases: torchzero.core.transform.TensorTransform
Normalizes the update.
Parameters:
-
norm_value(float, default:1) –desired norm value.
-
ord(float, default:2) –norm order. Defaults to 2.
-
dim(int | Sequence[int] | str | None, default:None) –calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in
dimthat they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None. -
inverse_dims(bool, default:False) –if True, the
dimsargument is inverted, and all other dimensions are normalized. -
min_size(int, default:1) –minimal size of a dimension to normalize along it. Defaults to 1.
-
target(str) –what this affects.
Examples: Gradient normalization:
Update normalization:
Source code in torchzero/modules/clipping/clipping.py
NormalizeByEMA ¶
Bases: torchzero.modules.clipping.ema_clipping.ClipNormByEMA
Sets norm of the update to be the same as the norm of an exponential moving average of past updates.
Parameters:
-
beta(float, default:0.99) –beta for the exponential moving average. Defaults to 0.99.
-
ord(float, default:2) –order of the norm. Defaults to 2.
-
eps(float) –epsilon for division. Defaults to 1e-6.
-
tensorwise(bool, default:True) –if True, norms are calculated parameter-wise, otherwise treats all parameters as single vector. Defaults to True.
-
max_ema_growth(float | None, default:1.5) –if specified, restricts how quickly exponential moving average norm can grow. The norm is allowed to grow by at most this value per step. Defaults to 1.5.
-
ema_init(str) –How to initialize exponential moving average on first step, "update" to use the first update or "zeros". Defaults to 'zeros'.
Source code in torchzero/modules/clipping/ema_clipping.py
NORMALIZE
class-attribute
¶
bool(x) -> bool
Returns True when the argument x is true, False otherwise. The builtins True and False are the only two instances of the class bool. The class bool is a subclass of the class int, and cannot be subclassed.
clip_grad_norm_ ¶
clip_grad_norm_(params: Iterable[Tensor], max_norm: float | None, ord: Union[Literal['mad', 'std', 'var', 'sum', 'l0', 'l1', 'l2', 'l3', 'l4', 'linf'], float, Tensor] = 2, dim: Union[int, Sequence[int], Literal['global'], NoneType] = None, inverse_dims: bool = False, min_size: int = 2, min_norm: float | None = None)
Clips gradient of an iterable of parameters to specified norm value. Gradients are modified in-place.
Parameters:
-
params(Iterable[Tensor]) –parameters with gradients to clip.
-
max_norm(float) –value to clip norm to.
-
ord(float, default:2) –norm order. Defaults to 2.
-
dim(int | Sequence[int] | str | None, default:None) –calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in
dimthat they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None. -
min_size(int, default:2) –minimal size of a dimension to normalize along it. Defaults to 1.
Source code in torchzero/modules/clipping/clipping.py
clip_grad_value_ ¶
Clips gradient of an iterable of parameters at specified value. Gradients are modified in-place. Args: params (Iterable[Tensor]): iterable of tensors with gradients to clip. value (float or int): maximum allowed value of gradient
Source code in torchzero/modules/clipping/clipping.py
normalize_grads_ ¶
normalize_grads_(params: Iterable[Tensor], norm_value: float, ord: Union[Literal['mad', 'std', 'var', 'sum', 'l0', 'l1', 'l2', 'l3', 'l4', 'linf'], float, Tensor] = 2, dim: Union[int, Sequence[int], Literal['global'], NoneType] = None, inverse_dims: bool = False, min_size: int = 1)
Normalizes gradient of an iterable of parameters to specified norm value. Gradients are modified in-place.
Parameters:
-
params(Iterable[Tensor]) –parameters with gradients to clip.
-
norm_value(float) –value to clip norm to.
-
ord(float, default:2) –norm order. Defaults to 2.
-
dim(int | Sequence[int] | str | None, default:None) –calculates norm along those dimensions. If list/tuple, tensors are normalized along all dimensios in
dimthat they have. Can be set to "global" to normalize by global norm of all gradients concatenated to a vector. Defaults to None. -
inverse_dims(bool, default:False) –if True, the
dimsargument is inverted, and all other dimensions are normalized. -
min_size(int, default:1) –minimal size of a dimension to normalize along it. Defaults to 1.