Index¶

All modules are awailable in tz.m namespace (e.g. tz.m.Adam). There are a lot of modules, so they are vaguely split into sub-packages, although some of them can be hard to categorize. You can also view all modules on a single (very long) page.

Optimization algorithms¶

Adaptive - Adaptive per-parameter learning rates + some other deep learning optimizers, e.g. Adam, etc.
Momentum - momentums and exponential moving averages.
Conjugate gradient - conjugate gradient methods.
Quasi-newton - quasi-newton methods that estimate the hessian using gradient information.
Second order - "True" second order methods that use exact second order information.
Gradient approximation - modules that estimate the gradient using function values.
Least-squares - least-squares methods (Gauss-newton)

Step size selection¶

Step size - step size selection methods like Barzilai-Borwein and Polyak's step size.
Line search - line search methods.
Trust region - trust region methods.

Auxillary modules¶

Clipping - gradient clipping, normalization, centralization, etc.
Weight decay - weight decay.
Operations - operations like adding modules, subtracting, grafting, tracking the maximum, etc.
Projections - allows any other modules to be used in some projected space. This has multiple uses, one is to save memory by projecting into a smaller subspace, another is splitting parameters into smaller blocks or merging them into a single vector, another one is peforming optimization in a different dtype or viewing complex tensors as real. This can also do things like optimize in fourier domain.
Smoothing - smoothing-based optimization, currently laplacian and gaussian smoothing are implemented.
Miscellaneous - a lot of uncategorized modules, notably gradient accumulation, switching, automatic resetting, random restarts.
Wrappers - this implements Wrap, which can turn most custom pytorch optimizers into chainable modules.