Index¶
All modules are awailable in tz.m
namespace (e.g. tz.m.Adam
).
There are a lot of modules, so they are vaguely split into sub-packages, although some of them can be hard to categorize. You can also view all modules on a single (very long) page.
Optimization algorithms¶
- Adaptive - Adaptive per-parameter learning rates + some other deep learning optimizers, e.g. Adam, etc.
- Momentum - momentums and exponential moving averages.
- Conjugate gradient - conjugate gradient methods.
- Quasi-newton - quasi-newton methods that estimate the hessian using gradient information.
- Second order - "True" second order methods that use exact second order information.
- Higher order - third and higher order methods (currently just higher order newton).
- Gradient approximation - modules that estimate the gradient using function values.
- Least-squares - least-squares methods (Gauss-newton)
Step size selection¶
- Step size - step size selection methods like Barzilai-Borwein and Polyak's step size.
- Line search - line search methods.
- Trust region - trust region methods.
Auxillary modules¶
- Clipping - gradient clipping, normalization, centralization, etc.
- Weight decay - weight decay.
- Operations - operations like adding modules, subtracting, grafting, tracking the maximum, etc.
- Projections - allows any other modules to be used in some projected space. This has multiple uses, one is to save memory by projecting into a smaller subspace, another is splitting parameters into smaller blocks or merging them into a single vector, another one is peforming optimization in a different dtype or viewing complex tensors as real. This can also do things like optimize in fourier domain.
- Smoothing - smoothing-based optimization, currently laplacian and gaussian smoothing are implemented.
- Miscellaneous - a lot of uncategorized modules, notably gradient accumulation, switching, automatic resetting, random restarts.
- Wrappers - this implements Wrap, which can turn most custom pytorch optimizers into chainable modules.