Skip to content

Index

All modules are awailable in tz.m namespace (e.g. tz.m.Adam). There are a lot of modules, so they are vaguely split into sub-packages, although some of them can be hard to categorize. You can also view all modules on a single (very long) page.

Optimization algorithms

  • Adaptive - Adaptive per-parameter learning rates + some other deep learning optimizers, e.g. Adam, etc.
  • Momentum - momentums and exponential moving averages.
  • Conjugate gradient - conjugate gradient methods.
  • Quasi-newton - quasi-newton methods that estimate the hessian using gradient information.
  • Second order - "True" second order methods that use exact second order information.
  • Higher order - third and higher order methods (currently just higher order newton).
  • Gradient approximation - modules that estimate the gradient using function values.
  • Least-squares - least-squares methods (Gauss-newton)

Step size selection

Auxillary modules

  • Clipping - gradient clipping, normalization, centralization, etc.
  • Weight decay - weight decay.
  • Operations - operations like adding modules, subtracting, grafting, tracking the maximum, etc.
  • Projections - allows any other modules to be used in some projected space. This has multiple uses, one is to save memory by projecting into a smaller subspace, another is splitting parameters into smaller blocks or merging them into a single vector, another one is peforming optimization in a different dtype or viewing complex tensors as real. This can also do things like optimize in fourier domain.
  • Smoothing - smoothing-based optimization, currently laplacian and gaussian smoothing are implemented.
  • Miscellaneous - a lot of uncategorized modules, notably gradient accumulation, switching, automatic resetting, random restarts.
  • Wrappers - this implements Wrap, which can turn most custom pytorch optimizers into chainable modules.